<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Data Tinkerer]]></title><description><![CDATA[The latest updates on data science, data engineering and data analysis - for free!]]></description><link>https://www.datatinkerer.io</link><image><url>https://substackcdn.com/image/fetch/$s_!JEdj!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png</url><title>Data Tinkerer</title><link>https://www.datatinkerer.io</link></image><generator>Substack</generator><lastBuildDate>Sun, 05 Apr 2026 13:29:30 GMT</lastBuildDate><atom:link href="https://www.datatinkerer.io/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Data Tinkerer]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[datatinkerer@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[datatinkerer@substack.com]]></itunes:email><itunes:name><![CDATA[Data Tinkerer]]></itunes:name></itunes:owner><itunes:author><![CDATA[Data Tinkerer]]></itunes:author><googleplay:owner><![CDATA[datatinkerer@substack.com]]></googleplay:owner><googleplay:email><![CDATA[datatinkerer@substack.com]]></googleplay:email><googleplay:author><![CDATA[Data Tinkerer]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[What the Data Crowd Was Reading in March 2026]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in March 2026.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-march-2026</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-march-2026</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 02 Apr 2026 03:30:16 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e2a58f73-a2c0-4074-8db5-67153f3c5d45_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>It&#8217;s time for another round-up on all things data and AI!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e2U5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e2U5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!e2U5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!e2U5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!e2U5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e2U5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e2U5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!e2U5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!e2U5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!e2U5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682cbb58-0721-4a68-871d-0258d12f7e6d_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for March!</p><div><hr></div><h3>Data science &amp; AI</h3><ul><li><p><strong><a href="https://www.newsletter.swirlai.com/p/state-of-context-engineering-in-2026">State of Context Engineering in 2026</a> (12 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Aurimas Grici&#363;nas&quot;,&quot;id&quot;:14122259,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F746f0396-fc7f-4690-b75c-ef482a8cb1c7_3684x3683.jpeg&quot;,&quot;uuid&quot;:&quot;d03a3318-95a8-4019-b842-e5f2c246d90e&quot;}" data-component-name="MentionToDOM"></span> argues that context engineering is evolving from prompt tinkering into a structured discipline where managing memory, retrieval and state becomes the core challenge of building reliable AI systems.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fhut!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fhut!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 424w, https://substackcdn.com/image/fetch/$s_!fhut!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 848w, https://substackcdn.com/image/fetch/$s_!fhut!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 1272w, https://substackcdn.com/image/fetch/$s_!fhut!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fhut!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp" width="1456" height="827" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:827,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:57642,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fhut!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 424w, https://substackcdn.com/image/fetch/$s_!fhut!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 848w, https://substackcdn.com/image/fetch/$s_!fhut!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 1272w, https://substackcdn.com/image/fetch/$s_!fhut!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae1b9f3-19f6-4e10-a7c3-ad53b06c1418_1456x827.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://nchagnet.pages.dev/blog/bayesian-statistics-for-confused-data-scientists">Bayesian statistics for confused data scientists</a> (15 minute read)<br></strong>Nicolas Chagnet explains Bayesian statistics in plain terms by showing that its real strength is not mathematical elegance but giving data scientists a cleaner way to reason about uncertainty and sparse real-world data.</p></li><li><p><strong><a href="https://www.decodingai.com/p/agentic-ai-engineering-guide-6-mistakes">Agentic AI Engineering Guide</a> (10 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Paul Iusztin&quot;,&quot;id&quot;:110559689,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0714d360-396c-4b41-a676-1b58dc1dc5f3_1470x1470.jpeg&quot;,&quot;uuid&quot;:&quot;a40f7d2d-1ffa-4921-80fe-eb7c7912ff68&quot;}" data-component-name="MentionToDOM"></span> argues that most agentic AI systems fail not because the model is weak, but because teams make avoidable engineering mistakes around context, architecture, planning and evaluation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FSYK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FSYK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 424w, https://substackcdn.com/image/fetch/$s_!FSYK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 848w, https://substackcdn.com/image/fetch/$s_!FSYK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 1272w, https://substackcdn.com/image/fetch/$s_!FSYK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FSYK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp" width="1096" height="549" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:549,&quot;width&quot;:1096,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:29226,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FSYK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 424w, https://substackcdn.com/image/fetch/$s_!FSYK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 848w, https://substackcdn.com/image/fetch/$s_!FSYK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 1272w, https://substackcdn.com/image/fetch/$s_!FSYK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00301e82-2fba-4a01-b821-9045b7938cfd_1096x549.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://jessicatalisman.substack.com/p/the-context-problem">The Context Problem</a> (29 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Jessica Talisman&quot;,&quot;id&quot;:24176542,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!zEsI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f1fe4e-779e-4a27-be92-71fac460ee01_935x935.jpeg&quot;,&quot;uuid&quot;:&quot;0da07ecd-fbbe-4d3e-a0c8-500b6799c802&quot;}" data-component-name="MentionToDOM"></span> argues that the AI industry has turned context into a token-priced billing unit even though context should really mean the relational structure that makes information coherent and useful.</p></li><li><p><strong><a href="https://magazine.sebastianraschka.com/p/visual-attention-variants">A Visual Guide to Attention Variants in Modern LLMs</a> (26 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Sebastian Raschka, PhD&quot;,&quot;id&quot;:27393275,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F61f4c017-506f-4e9b-a24f-76340dad0309_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;a0f35f83-e9bb-408b-a811-d935e0af6c10&quot;}" data-component-name="MentionToDOM"></span> shows how modern LLM attention has evolved from standard multi-head attention into a growing set of variants each designed to balance quality, memory use and inference efficiency.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s-uv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s-uv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 424w, https://substackcdn.com/image/fetch/$s_!s-uv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 848w, https://substackcdn.com/image/fetch/$s_!s-uv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!s-uv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s-uv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg" width="1456" height="949" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:949,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:172464,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s-uv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 424w, https://substackcdn.com/image/fetch/$s_!s-uv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 848w, https://substackcdn.com/image/fetch/$s_!s-uv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!s-uv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98bbb1b3-72dd-42c0-967c-f20689e839d9_1494x974.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://amandeepsp.github.io/blog/high-dims">The Boon of Dimensionality</a> (6 minute read)</strong><br>Amandeep Singh shows that high-dimensional space creates the geometric conditions that make embeddings, random projections and feature separation work in modern machine learning.</p></li><li><p><strong><a href="https://www.interconnects.ai/p/gpt-54-is-a-big-step-for-codex">GPT 5.4 is a big step for Codex</a> (9 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Nathan Lambert&quot;,&quot;id&quot;:10472909,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!RihO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fedcdfb-e137-4f6a-9089-a46add6c6242_500x500.jpeg&quot;,&quot;uuid&quot;:&quot;5138e0c6-b974-4dfe-9416-12c648293080&quot;}" data-component-name="MentionToDOM"></span> writes that GPT 5.4 feels like a real step forward for Codex, with gains in usability, speed, context handling and agent reliability that matter more in practice than benchmark scores alone.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents">How Shopify Scales Taxonomy Evolution Across 10,000+ Categories With Multi-Agent AI</a> (14 minute read)<br></strong>This piece breaks down how Shopify moved from reactive manual updates to a multi-agent system that scans taxonomy branches in parallel, proposes new categories/attributes from merchant data, detects duplicates and runs automated QA through domain-specific judges.</p></li></ul><div><hr></div><h3>Data engineering</h3><ul><li><p><strong><a href="https://seattledataguy.substack.com/p/layer-by-layer-we-built-data-systems">Layer by Layer, We Built Data Systems No One Understands</a> (9 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;SeattleDataGuy&quot;,&quot;id&quot;:4963622,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec905aa-9a7b-4f21-b0ff-fec92e8916d1_512x512.jpeg&quot;,&quot;uuid&quot;:&quot;776d9623-2ff5-4cb6-875f-8f43c8ab263f&quot;}" data-component-name="MentionToDOM"></span> writes that modern data stacks keep piling on layers in the name of simplicity but the result is often more sprawl, more cost and systems that are harder to understand or tie back to business outcomes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8W02!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8W02!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 424w, https://substackcdn.com/image/fetch/$s_!8W02!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 848w, https://substackcdn.com/image/fetch/$s_!8W02!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 1272w, https://substackcdn.com/image/fetch/$s_!8W02!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8W02!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:47330,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8W02!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 424w, https://substackcdn.com/image/fetch/$s_!8W02!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 848w, https://substackcdn.com/image/fetch/$s_!8W02!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 1272w, https://substackcdn.com/image/fetch/$s_!8W02!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d9bc5fb-0da4-429b-a887-ee35e2cfc412_1024x768.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://www.dataengineeringweekly.com/p/etl-is-dead">ETL is Dead</a> (12 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Ananth Packkildurai&quot;,&quot;id&quot;:3520227,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!mRE-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f38fa68-8a30-4357-a48e-6833efe28c0f_989x989.jpeg&quot;,&quot;uuid&quot;:&quot;96474d3c-2a11-4127-a971-49633612ee83&quot;}" data-component-name="MentionToDOM"></span> argues that ETL is not disappearing in volume but it is fading as the core identity of data engineering as AI shifts the real work toward context and semantics.</p></li><li><p><strong><a href="https://pipeline2insights.substack.com/cp/189784942">The Data Engineer&#8217;s GitHub Portfolio (2026 Edition)</a> (10 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Yordan Ivanov&quot;,&quot;id&quot;:40945395,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ma-p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76f52904-5428-4d97-82a5-3faa722b8d46_2234x1253.jpeg&quot;,&quot;uuid&quot;:&quot;3ba92759-3c5c-463f-afd4-48c62129a1c6&quot;}" data-component-name="MentionToDOM"></span> writes that a strong data engineering GitHub portfolio should prove technical taste, system thinking and real-world problem solving, not just show a pile of tutorial projects.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wkgQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wkgQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 424w, https://substackcdn.com/image/fetch/$s_!wkgQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 848w, https://substackcdn.com/image/fetch/$s_!wkgQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 1272w, https://substackcdn.com/image/fetch/$s_!wkgQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wkgQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp" width="700" height="609" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:609,&quot;width&quot;:700,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wkgQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 424w, https://substackcdn.com/image/fetch/$s_!wkgQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 848w, https://substackcdn.com/image/fetch/$s_!wkgQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 1272w, https://substackcdn.com/image/fetch/$s_!wkgQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ea9aaa-bfb2-4818-b107-f628731493fd_700x609.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://buildtolaunch.substack.com/cp/191744000">The Data Engineering Mindset Every AI Builder Needs</a> (14 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Erfan Hesami&quot;,&quot;id&quot;:277538242,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!rcW2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2692f-48e0-43a5-9f33-7eebb007bd6e_1641x1641.jpeg&quot;,&quot;uuid&quot;:&quot;a2ee5071-74dc-4515-829b-b04cb357605c&quot;}" data-component-name="MentionToDOM"></span> writes that most AI products do not break because of the model but because builders ignore the data foundations early on, especially data flow design, data quality and monitoring.</p></li><li><p><strong><a href="https://pthorpe92.dev/databasemaxxing/">The absolute beginners guide to databasemaxxing</a> (18 minute read)<br></strong>This article walks through database internals from a beginner&#8217;s perspective, showing how concepts like parsing, binding, scans and index seeks fit together under the hood.</p></li><li><p><strong><a href="https://ghostinthedata.info/posts/2026/2026-03-14-your-data-model-isnt-broken-part-1/">Your Data Model Isn&#8217;t Broken, Part I: Why Refactoring Beats Rebuilding</a> (12 minute read)<br></strong>Chris Hillman makes the case that most broken data models are really bundles of hard-won business knowledge and that careful refactoring is usually smarter than blowing everything up and starting again.</p></li><li><p><strong><a href="https://vutr.substack.com/p/clickhouse-real-time-insight-in-15">ClickHouse -&gt; Real-time insight in 15 minutes</a> (19 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Phi Vu Trinh&quot;,&quot;id&quot;:167177248,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!UWAa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4805f673-db97-4f7c-85c4-44b345a8de80_256x256.png&quot;,&quot;uuid&quot;:&quot;3babf9ab-7ba3-449f-b67e-f70c16a65ce8&quot;}" data-component-name="MentionToDOM"></span> shows that ClickHouse is built for real-time analytics but getting that performance in production usually means handling enough operational complexity that makes platforms like Tinybird appealing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QNqF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QNqF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 424w, https://substackcdn.com/image/fetch/$s_!QNqF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 848w, https://substackcdn.com/image/fetch/$s_!QNqF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 1272w, https://substackcdn.com/image/fetch/$s_!QNqF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QNqF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp" width="1456" height="1040" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1040,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38334,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QNqF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 424w, https://substackcdn.com/image/fetch/$s_!QNqF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 848w, https://substackcdn.com/image/fetch/$s_!QNqF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 1272w, https://substackcdn.com/image/fetch/$s_!QNqF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe71616cf-67b4-4591-ad4e-90961030734e_1456x1040.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces">How Notion Scaled AI Q&amp;A to Millions of Workspaces</a> (14 minute read)<br></strong>This article walks through how Notion scaled AI Q&amp;A to millions of workspaces while increasing onboarding throughput 600x and cutting costs by up to 90%.</p></li></ul><div><hr></div><h3><strong>Other interesting reads</strong></h3><ul><li><p><strong><a href="https://thedataecosystem.substack.com/p/issue-53-business-models-and-data">Relevance of Business Models for Data</a> (10 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Dylan Anderson&quot;,&quot;id&quot;:14172622,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F128526c2-c66d-497b-ab50-f95deb8ce0fc_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;e33430e3-f195-49f5-9e0d-371a21acad16&quot;}" data-component-name="MentionToDOM"></span> makes the case that data teams should start with the business model first, because strategy, architecture, governance and analytics all work better when they are tied to how the company actually creates and captures value.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f6qe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f6qe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 424w, https://substackcdn.com/image/fetch/$s_!f6qe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 848w, https://substackcdn.com/image/fetch/$s_!f6qe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 1272w, https://substackcdn.com/image/fetch/$s_!f6qe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f6qe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp" width="1351" height="652" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:652,&quot;width&quot;:1351,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:134148,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!f6qe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 424w, https://substackcdn.com/image/fetch/$s_!f6qe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 848w, https://substackcdn.com/image/fetch/$s_!f6qe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 1272w, https://substackcdn.com/image/fetch/$s_!f6qe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e70f19f-300f-4a7d-b313-0dd3f47f89db_1351x652.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://hipster-data-show.ghost.io/its-about-the-strategy-stupid/">It&#8217;s about the strategy, stupid</a> (14 minute read)<br></strong>Timo Dechau makes the case that data work becomes far more useful when it starts with business strategy, not with dashboards, tracking audits or whatever tactic happens to be fashionable.</p></li><li><p><strong><a href="https://epochai.substack.com/p/what-do-frontier-ai-companies-job">What do frontier AI companies&#8217; job postings reveal about their plans?</a> (9 minute read)</strong><br>Interesting article suggesting that frontier labs&#8217; job postings reveal where the market is heading, with hiring patterns pointing to heavier go-to-market pushes, new product bets and different strategies for securing compute and data.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8UOo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8UOo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 424w, https://substackcdn.com/image/fetch/$s_!8UOo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 848w, https://substackcdn.com/image/fetch/$s_!8UOo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 1272w, https://substackcdn.com/image/fetch/$s_!8UOo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8UOo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp" width="1026" height="1283" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1283,&quot;width&quot;:1026,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48938,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/192483521?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8UOo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 424w, https://substackcdn.com/image/fetch/$s_!8UOo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 848w, https://substackcdn.com/image/fetch/$s_!8UOo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 1272w, https://substackcdn.com/image/fetch/$s_!8UOo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b5612d7-5fac-4abd-abc8-846505c60bda_1026x1283.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3><strong>Quick favor - need your take</strong></h3><div class="poll-embed" data-attrs="{&quot;id&quot;:485645}" data-component-name="PollToDOM"></div><p><strong>Was there any standout article or topic from March I missed? Feel free to drop a comment or hit reply, even a quick line helps.</strong></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;79d232de-3d15-4abe-b46e-997945fb8a86&quot;,&quot;caption&quot;:&quot;It&#8217;s time for another data/AI roundup and here are the highlights from February&#128071;<br /><br />&#119811;&#119834;&#119853;&#119834; &#119826;&#119836;&#119842;&#119838;&#119847;&#119836;&#119838; &amp;amp; &#119808;&#119816;<br />Inside OpenAI&#8217;s in-house data agent<br />A practical guide to which AI to use in the agentic era<br />Why judgment may not be uniquely human after all<br />How Codex is being used for serious research automation<br />Why semantic linking matters for giving data meaning<br /><br />&#119811;&#119834;&#119853;&#119834; &#119812;&#119847;&#119840;&#119842;&#119847;&#119838;&#119838;&#119851;&#119842;&#119847;&#119840;<br />A portable analytics stack built on DuckDB, DuckLake, dlt and SQLMesh<br />Why healing tables beat slow-motion backfill disasters<br />The case for MetadataOps engineers<br />How to use AI tools without losing data engineering fundamentals<br />Why 5-second BigQuery queries can still be expensive<br /><br />&#119811;&#119834;&#119853;&#119834; &#119808;&#119847;&#119834;&#119845;&#119858;&#119852;&#119842;&#119852; &amp;amp; &#119809;&#119816;<br />The state of machine learning competitions in 2025<br /><br />Plus: why AI is eating software&#8217;s TAM, what world models could unlock in robotics and why AI may intensify work instead of reducing it.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in February 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-12T04:00:47.277Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9b4d29f-5a76-44a8-902d-bc2983dbe445_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-february-2026&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:190247984,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:8,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1b7b5909-6ab4-4a88-98a1-d09e96554f4d&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from January&#128071;<br /><br />&#119811;&#119834;&#119853;&#119834; &#119826;&#119836;&#119842;&#119838;&#119847;&#119836;&#119838; &amp;amp; &#119808;&#119816;<br />Why &#8216;use agents or be left behind&#8217; is mostly about practical automation<br />Piecewise regression for spotting regime shifts in time series<br />Why AI benchmarks are hitting a measurement wall<br />What the data actually says about the state of open models<br />How large-scale recommendation systems are built in the real world<br /><br />&#119811;&#119834;&#119853;&#119834; &#119812;&#119847;&#119840;&#119842;&#119847;&#119838;&#119838;&#119851;&#119842;&#119847;&#119840;<br />How Unity Catalog really works under the hood<br />Databricks Lakeflow vs Airflow in practice<br />End-to-end agentic data modeling with OpenMetadata<br />A candid look at the day-to-day reality of data engineering<br />How Uber cut data lake freshness from hours to minutes with Flink<br /><br />&#119811;&#119834;&#119853;&#119834; &#119808;&#119847;&#119834;&#119845;&#119858;&#119852;&#119842;&#119852; &amp;amp; &#119809;&#119816;<br />The best data visualization projects of 2025<br />Why storytelling matters more than chart tricks<br />Designing more accessible line charts<br />Practical rules for dashboard filter placement<br /><br />Plus: ontologies explained, hard lessons from building AI agents in finance and new data on who&#8217;s really buying AI compute.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in January 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-05T03:20:52.027Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8f4e23e-4e9e-4420-bbed-d16a4d242c7d_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-january-2026&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:186553359,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:16,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Notion Scaled AI Q&A to Millions of Workspaces]]></title><description><![CDATA[Kafka, Spark and Ray powering low-latency, high-throughput search pipelines]]></description><link>https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 26 Mar 2026 04:00:33 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today we will look at how Notion scaled its AI Q&amp;A to millions of users.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HsBV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HsBV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HsBV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HsBV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!HsBV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F084aaae7-58c7-464e-b0f8-7c31523985ef_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="3840" height="2160" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2160,&quot;width&quot;:3840,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a black and white block with the letter n on it&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a black and white block with the letter n on it" title="a black and white block with the letter n on it" srcset="https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1683114010575-3ead4403180e?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxub3Rpb258ZW58MHx8fHwxNzc0MjU5NjgzfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@maria_shalabaieva">Mariia Shalabaieva</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>Now, with that out of the way, let&#8217;s get to Notion&#8217;s AI Q&amp;A level up!</p><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Notion launched AI Q&amp;A on top of vector search and quickly faced massive demand across millions of workspaces. The initial system hit limits in capacity, onboarding speed and cost.</p><h4><strong>Task</strong></h4><p>Scale onboarding, keep indexes fresh and reduce rising infrastructure costs. At the same time, simplify a growingly complex architecture without hurting latency.</p><h4><strong>Action</strong></h4><p>They introduced dual ingestion paths, generation-based indexing, serverless architecture and migrated to turbopuffer. Then reduced recomputation with page state tracking and moved embeddings to Ray for unified compute.</p><h4><strong>Result</strong></h4><p>600x onboarding growth, 15x workspace growth and major cost reductions across layers. Latency improved and the system became simpler and more efficient.</p><h4><strong>Use Cases</strong></h4><p>Real-time search indexing, semantic search, document retrieval</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Spark, AWS EMR, Apache Airflow, Apache Kafka, AWS S3, DynamoDB, Ray, turbopuffer</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Context</h4><p>When <a href="https://www.notion.com/blog/introducing-q-and-a">Notion launched AI Q&amp;A</a> in November 2023, the core idea sounded simple enough: let people ask natural-language questions and retrieve relevant knowledge from across their workspace and connected tools. In practice, that meant building a vector search system that could ingest huge amounts of content, stay fresh as pages changed and do all of it at a cost that made sense at Notion scale.</p><p>That is the real story here. Not just &#8220;vector search powers AI&#8221; but what happens after launch, when adoption jumps faster than expected and the infrastructure underneath has to keep up. Over two years, the Notion team pushed that system through several big transitions: scaling onboarding, dealing with storage pressure, changing database architecture, reworking indexing logic and moving embeddings workloads onto Ray. The headline numbers are hard to ignore: 10x scale and roughly one-tenth the cost.</p><p>This is a good example of how modern AI infrastructure usually evolves. The first version gets the product live. The next few versions are about survival, then simplification, then cost, then latency, then getting rid of all the awkward bits that built up during the rush.</p><div><hr></div><h4>Vector search, explained through Notion&#8217;s lens</h4><p>Traditional keyword search is literal. It works when users type the exact words that exist in the content. It starts falling apart when the wording changes but the meaning stays the same. Someone searching for &#8220;team meeting notes&#8221; may still want a page called &#8220;group standup summary,&#8221; but keyword search does not naturally understand that those are closely related.</p><p>Vector search solves that by representing text as embeddings. Instead of storing only words, it maps text into a high-dimensional space where semantically similar ideas sit closer together. That means retrieval is based on meaning, not exact phrasing.</p><p>For Notion AI, this matters a lot. The system needs to answer questions in natural language by finding useful content across a workspace and even across connected sources like Slack and Google Drive. That is exactly the sort of setup where semantic retrieval becomes more useful than plain lexical matching. A user is not thinking about the title of the page or the exact phrasing inside a paragraph. They are asking a question in their own words and expecting the system to bridge the gap.</p><p>That expectation becomes expensive very quickly.</p><div><hr></div><h4>Part 1: Scaling beyond what the original system expected</h4><p>At launch, Notion&#8217;s ingestion and indexing pipeline had two paths.</p><p>The first was an offline path. Batch jobs running on Apache Spark would chunk existing documents, generate embeddings through an API and bulk-load those vectors into the vector database. This handled the heavy lifting for backfilling large amounts of existing content.</p><p>The second was an online path. Kafka consumers processed page edits in near real time so live workspaces stayed up to date with sub-minute latency.</p><p>It is a practical split. The offline side handles the backlog and large initial loads. The online side keeps things fresh once a workspace is active. Together, the two-path setup gave Notion a way to onboard workspaces at scale without sacrificing freshness for day-to-day edits.</p><p>The vector database itself ran on dedicated &#8216;pod&#8217; clusters, where storage and compute were coupled. The Notion team designed sharding in a way that echoed their Postgres setup: workspace ID was the partitioning key, routing used range-based partitioning and a single config referenced all shards.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zNlu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zNlu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 424w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 848w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 1272w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zNlu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif" width="1456" height="957" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:957,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12663,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zNlu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 424w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 848w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 1272w, https://substackcdn.com/image/fetch/$s_!zNlu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff165c24e-314a-4bfb-ae67-84e69f2b5c63_2172x1428.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">pipelines writing into sharded vector database pods (Source: Notion)</figcaption></figure></div><p>That all made sense on paper. Then the product launched and demand was overwhelming.</p><p>Notion quickly built up a waitlist of millions of workspaces that wanted access to Q&amp;A. The problem was no longer whether the system worked. It was how fast it could onboard people without cracking under the pressure.</p><p><strong>When the indexes started to fill up</strong></p><p>Only a month after launch, the original indexes were already nearing capacity.</p><p>That is the kind of problem that sounds good in product meetings and bad in infrastructure meetings. If the indexes filled up, Notion would have to pause onboarding. That would slow down rollout and delay access for everyone waiting.</p><p>The team had two obvious options.</p><p>One was to re-shard incrementally. Clone data into another index, delete half, repeat and keep doing that every couple of weeks as new customers came in.</p><p>The other was to re-shard for the final expected volume. But their vector database provider charged for uptime, so over-provisioning would have been painfully expensive.</p><p>Instead, the Notion team went with a third approach. When a set of indexes got close to full, they provisioned a new set and directed all newly onboarded workspaces there. Each set was assigned a generation ID, which determined where reads and writes should go.</p><p>It is not the prettiest long-term design, but it was a smart short-term move. It avoided repeated re-shard operations and kept onboarding moving. Sometimes the right scaling decision is not the most elegant one. It is the one that buys breathing room without stopping the business.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a8zu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a8zu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 424w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 848w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 1272w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a8zu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif" width="1456" height="891" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:891,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a8zu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 424w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 848w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 1272w, https://substackcdn.com/image/fetch/$s_!a8zu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4100f229-aff6-4fd9-8fa3-f35e4878a0aa_1990x1218.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">New index &#8216;generations&#8217; added as capacity fills, routing new workspaces without re-sharding. (Source: Notion)</figcaption></figure></div><p><strong>Turning onboarding into a throughput problem</strong></p><p>Even with the architecture in place, the initial onboarding rate was nowhere near enough. At launch, Notion could onboard only a few hundred workspaces per day. At that pace, clearing a multi-million waitlist would have taken decades which is obviously not a real option.</p><p>So the team pushed hard on throughput. Using Airflow scheduling, pipelining and Spark job tuning, they dramatically increased capacity.</p><p>The results were big:</p><ul><li><p>Daily onboarding capacity increased by <strong>600x</strong></p></li><li><p>Active workspaces grew <strong>15x</strong></p></li><li><p>Vector database capacity expanded <strong>8x</strong></p></li></ul><p>By April 2024, the Q&amp;A waitlist was cleared.</p><p>That is the kind of milestone that looks clean in hindsight but it came with a cost. Managing multiple generations of databases helped during the hypergrowth phase but it also added operational complexity and financial overhead. The team had solved the immediate scaling problem, but the architecture was starting to feel heavy.</p><p>That set up the next phase of the story.</p><div><hr></div><h4>Part 2: Cost becomes the next constraint</h4><p>In May 2024, Notion migrated its embeddings workload from the original dedicated &#8216;pod&#8217; architecture to a serverless setup that decoupled storage from compute and charged based on usage instead of uptime.</p><p>The effect was immediate. Costs dropped by 50 percent from peak usage, translating into several millions of dollars in annual savings.</p><p>That alone would have made the migration worthwhile, but the serverless design also fixed two practical problems. First, it removed the storage capacity constraints that had become a serious scaling bottleneck. Second, it simplified operations because the team no longer had to provision capacity ahead of demand.</p><p>Still, even after cutting costs in half, the annual run rate for vector database spend was still in the millions. From an engineering point of view, this is where things get interesting. The easy win had already happened. Now the team had to go after deeper structural gains.</p><p><strong>A new search foundation (turbopuffer)</strong></p><p>While working on the first round of savings, Notion also evaluated alternative search engines. <a href="https://turbopuffer.com/">turbopuffer</a> stood out because it offered significantly lower projected costs.</p><p>At the time, turbopuffer was a newer player in search. Its architecture was built on object storage with a focus on cost-efficiency and performance. It also supported both managed and bring-your-own-cloud deployment models and it made bulk modification of stored vector objects easier.</p><p>That combination lined up well with what Notion needed.</p><p>After a successful evaluation, the team decided to migrate its entire multi-billion-object workload to turbopuffer in late 2024. Since they were already making a provider switch, they used the migration as a chance to clean up the broader architecture too.</p><p>Several changes happened together.</p><p>First, they fully re-indexed the corpus, increasing write throughput in the offline indexing pipeline to rebuild everything in turbopuffer.</p><p>Second, they upgraded the embeddings model during the migration to be more performant.</p><p>Third, they simplified the architecture. turbopuffer treats each namespace as an independent index which removed the need to think about sharding and generation-based routing in the same way as before.</p><p>Finally, they handled the cutover gradually, migrating one generation at a time and validating correctness before moving on.</p><p>This is a strong pattern: if a migration is painful anyway, use it to pay off other infrastructure debt at the same time.</p><p>The outcome was solid on several fronts:</p><ul><li><p><strong>60 percent cost reduction</strong> on search engine spend</p></li><li><p><strong>35 percent reduction</strong> in AWS EMR compute costs</p></li><li><p>p50 production query latency <strong>improved from 70&#8211;100ms to 50&#8211;70ms</strong></p></li></ul><p>That is a meaningful improvement across cost and performance, which is not always easy to pull off together.</p><p><strong>Avoiding full reprocessing with page state tracking</strong></p><p>The next optimization went after a very expensive inefficiency in the indexing pipeline.</p><p>Notion pages can be long, so the team chunks each page into spans, embeds each span and stores those vectors with metadata such as authors and permissions. In the original implementation, any edit to a page or its properties triggered a full re-chunk, full re-embed and full re-upload of all spans on that page.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ytMS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ytMS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 424w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 848w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 1272w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ytMS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif" width="1000" height="189" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:189,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2615711,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ytMS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 424w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 848w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 1272w, https://substackcdn.com/image/fetch/$s_!ytMS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0ca76a6-5c49-4f2b-ba2d-f612690b6b83_1000x189.gif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Page &#8594; chunking &#8594; embedding &#8594; vector DB with full reprocessing on every edit. (Source: Notion)</figcaption></figure></div><p>That meant even a tiny change could trigger a lot of unnecessary work.</p><p>The team narrowed the problem down to two things that actually mattered:</p><ol><li><p>The page text changes which means embeddings need updating</p></li><li><p>The metadata changes which means metadata needs updating</p></li></ol><p>To detect those cases, they tracked two hashes per span: one hash for the span text and another for the metadata fields. They chose 64-bit xxHash because it offered a good balance of speed, simplicity, low collision risk and storage footprint.</p><p>For caching, they used DynamoDB. Each page had one record containing the state of all spans on that page, including text and metadata hashes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mj4k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mj4k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 424w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 848w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 1272w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif" width="1396" height="858" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:858,&quot;width&quot;:1396,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23088,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mj4k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 424w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 848w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 1272w, https://substackcdn.com/image/fetch/$s_!Mj4k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8731bd9-7cf6-489b-bd3b-66077e710a40_1396x858.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Span-level hashing (text + metadata) with DynamoDB state to detect and update only changed spans. (Source: Notion)</figcaption></figure></div><p>The win came from using that state to avoid unnecessary work.</p><p><strong>Case 1: The page text changes</strong></p><p>Imagine Herman Melville editing <em>Moby Dick</em> halfway through a page. Before this improvement, the whole page would have been re-embedded and reloaded. After the change, the system chunks the page, fetches the previous state from DynamoDB and compares text hashes span by span. It can then detect which spans actually changed and only re-embed and reload those.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xTeN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xTeN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 424w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 848w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 1272w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xTeN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif" width="1000" height="151" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:151,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1891331,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xTeN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 424w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 848w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 1272w, https://substackcdn.com/image/fetch/$s_!xTeN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fd33692-a8bf-4d28-bcd7-96a5201504f5_1000x151.gif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Only changed spans are re-embedded and updated using page state + text hash comparison. (Source: Notion)</figcaption></figure></div><p>That is the kind of fix that getting the balance right matters. Miss a changed span and search quality suffers. Reprocess too much and cost stays high.</p><p><strong>Case 2: The metadata changes</strong></p><p>Now imagine Melville updates permissions so the page becomes visible to everyone. The permissions metadata changes but the text does not.</p><p>Previously, that still meant re-embedding and reloading the entire page. With the new approach, Notion compares both text and metadata hashes. If the text hashes are unchanged but metadata hashes differ, the system skips embedding entirely and issues a PATCH command to the vector database to update only the metadata. That is much cheaper than recomputing embeddings.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6qtN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6qtN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 424w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 848w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 1272w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6qtN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif" width="1000" height="197" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:197,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2162583,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6qtN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 424w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 848w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 1272w, https://substackcdn.com/image/fetch/$s_!6qtN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44aa866c-9f82-4a8f-b065-1e5e780a2175_1000x197.gif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Metadata-only changes skip embeddings and update spans via PATCH in the vector DB. (Source: Notion)</figcaption></figure></div><p>Across these changes, the Page State Project reduced data volume by 70 percent. That saved money on both embeddings API costs and vector database write costs.</p><p><strong>Moving embeddings to Ray (indexing)</strong></p><p>In July 2025, Notion started migrating its near real-time embeddings pipeline to <a href="https://www.ray.io/">Ray</a> on <a href="https://www.anyscale.com/">Anyscale</a>.</p><p>The motivation came from several pain points in the earlier setup.</p><p>One was the <strong>&#8216;double compute&#8217; problem</strong>. Spark on EMR handled preprocessing like chunking, transformations and API orchestration, but embeddings themselves were still generated through an external provider that charged per token. So the team was paying for both preprocessing infrastructure and embedding API usage.</p><p>Another issue was <strong>endpoint reliability</strong>. Fresh search indexes depended on the stability of an external embeddings API.</p><p>The third problem was <strong>clunky pipelining</strong>. To smooth traffic and avoid API rate limits, the team had built a multi-step handoff process where Spark jobs passed batches through S3. It worked but it was clunky.</p><p>Ray and Anyscale gave Notion a cleaner path.</p><p>Ray let the team run open-source embedding models directly, which meant more model flexibility and less dependence on external providers. By consolidating preprocessing and inference onto a single compute layer, they could cut out the double-compute setup. Ray also supports pipelining CPU-bound work such as chunking and page-state detection with GPU-bound embedding generation on the same nodes, which helps keep utilization high.</p><p>There was also a developer productivity angle. Anyscale workspaces let engineers write and test pipelines from their preferred tools without having to provision infrastructure manually.</p><p>And on the product side, self-hosting embeddings removed a third-party API hop from the user-facing path, which helped reduce end-to-end latency.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UN1z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UN1z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 424w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 848w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 1272w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UN1z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif" width="1000" height="362" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:362,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1537621,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/191742179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UN1z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 424w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 848w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 1272w, https://substackcdn.com/image/fetch/$s_!UN1z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a163616-e23f-465c-a9d4-e974253208d3_1000x362.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Ray natively supports pipelining CPU bound tasks (chunking, detecting page state) with GPU bound embeddings generation within the same node. (Source: Notion)</figcaption></figure></div><p>The rollout is still ongoing, but early results suggest a 90+ percent reduction in embeddings infrastructure costs. That is a major shift in how the economics of the system work.</p><p><strong>Real-time query embeddings on Ray (serving)</strong></p><p>Indexing is only half the picture. When users or agents search in Notion, queries must also be embedded on the fly before the vector database can be searched.</p><p>That makes serving latency-sensitive. The embedding has to happen fast enough that the search still feels responsive.</p><p>Hosting large embedding models is not trivial. GPU allocation, ingress routing, replication and autoscaling all matter, especially when traffic is uneven and expectations for responsiveness are high.</p><p><a href="https://docs.ray.io/en/latest/serve/index.html">Ray Serve</a> helped Notion here by handling much of that operational layer out of the box. The team could wrap open-source embedding models in persistent deployments that stay loaded on GPU, configure request batching and replication and manage the serving setup with normal Python code plus YAML-based infrastructure configuration.</p><p>That is a pretty practical endpoint for the broader journey.</p><p>What started as a vector search stack built quickly enough to launch AI Q&amp;A turned into a much more refined system: simpler in some places, more selective in others, cheaper across multiple layers and faster where users feel it. The interesting part is not any single tool choice. It is how the Notion team kept removing bottlenecks one by one: storage limits, awkward shard routing, redundant recomputation, external API dependence and fragmented compute layers.</p><p>That is usually what mature AI infrastructure looks like in the real world. Not one giant redesign. A sequence of sharp decisions, each fixing the thing that has become too expensive, too slow or too annoying to keep around.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://www.notion.com/blog/two-years-of-vector-search-at-notion">Notion's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-notion-scaled-ai-q-and-a-to-millions-of-workspaces?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0dbf0d77-87fd-4655-82da-31cc841a6d73&quot;,&quot;caption&quot;:&quot;LinkedIn pushed Venice to handle 175M+ lookups per second while ingesting 230M writes per second.<br /><br />This piece breaks down how they balanced compaction, CPU bottlenecks and adaptive throttling to scale ingestion under eventual consistency.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How LinkedIn Built a Pipeline That Scales to 230M Records/sec Without Breaking SLAs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-19T04:00:52.353Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:187999868,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;765aa4a7-c63b-4175-8423-aae14d8d54cb&quot;,&quot;caption&quot;:&quot;Grab needed to detect schema and value issues in Kafka streams while data was still in motion.<br /><br />This piece breaks down how they introduced real-time checks and fast alerts to catch poison events before they spread.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Grab Detects Data Issues across 100+ Kafka Topics Before They Spread&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-15T04:15:57.055Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1624957083543-9a67140fabfd?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:183755897,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[What the Data Crowd Was Reading in February 2026]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in February 2026.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-february-2026</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-february-2026</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 12 Mar 2026 04:00:47 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/b9b4d29f-5a76-44a8-902d-bc2983dbe445_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>It&#8217;s time for another round-up on all things data and AI!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ah3D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ah3D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!ah3D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!ah3D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!ah3D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ah3D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ah3D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!ah3D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!ah3D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!ah3D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faef79236-cc80-4038-9560-5b3fe6da7359_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for February!</p><div><hr></div><h3>Data science &amp; AI</h3><ul><li><p><strong><a href="https://openai.com/index/inside-our-in-house-data-agent">Inside OpenAI&#8217;s in-house data agent</a> (14 minute read)<br></strong>OpenAI explains how its in-house data agent combines rich internal context, live querying and self-learning memory to help employees go from vague business questions to trustworthy analysis in minutes.</p></li><li><p><strong><a href="https://www.oneusefulthing.org/p/a-guide-to-which-ai-to-use-in-the">A Guide to Which AI to Use in the Agentic Era</a> (17 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Ethan Mollick&quot;,&quot;id&quot;:846835,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c05cdbc-40fd-459b-915d-f8bc8ac8bf01_3509x5263.jpeg&quot;,&quot;uuid&quot;:&quot;9368bcdb-9ff7-4426-bf98-c55a4e3f944c&quot;}" data-component-name="MentionToDOM"></span> breaks down the current AI tool landscape into a simple question: which model is best for this specific task, not which one wins the internet on any given day.</p></li><li><p><strong><a href="https://stevenadler.substack.com/p/judgment-isnt-uniquely-human">Judgment isn&#8217;t uniquely human</a> (19 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Steven Adler&quot;,&quot;id&quot;:7944928,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a4cc0ff3-5403-4378-bee6-aded1be48a65_2317x2317.png&quot;,&quot;uuid&quot;:&quot;dcb868dd-7766-411c-9240-7e4f3a41b4fd&quot;}" data-component-name="MentionToDOM"></span> argues judgment and taste are not uniquely human, and that treating them as off-limits to AI is another case of people underestimating how quickly models can learn high-level cognitive tasks.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bd7U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bd7U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 424w, https://substackcdn.com/image/fetch/$s_!Bd7U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 848w, https://substackcdn.com/image/fetch/$s_!Bd7U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 1272w, https://substackcdn.com/image/fetch/$s_!Bd7U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bd7U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp" width="1164" height="284" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:284,&quot;width&quot;:1164,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36176,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bd7U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 424w, https://substackcdn.com/image/fetch/$s_!Bd7U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 848w, https://substackcdn.com/image/fetch/$s_!Bd7U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 1272w, https://substackcdn.com/image/fetch/$s_!Bd7U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1388f51d-9940-42e1-bda9-b1774f0a727c_1164x284.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li><li><p><strong><a href="https://x.com/kareldoostrlnck/status/2019477361557926281?utm_source=tldrai&amp;utm_medium=newsletter">I spent $10,000 to automate my research at OpenAI with Codex</a> (6 minute read)</strong><br>A researcher from OpenAI argues that people still underestimate what Codex can do in real workflows, sharing a high-usage setup and the practical lessons from using it at serious scale.</p></li><li><p><strong><a href="https://commonsensedata.substack.com/p/semantic-linking-the-aboutness-of">Semantic Linking: the Aboutness of Data</a> (12 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Juha Korpela&quot;,&quot;id&quot;:195506571,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!QAUB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19dd5ae5-a523-4e05-a139-00405295f5af_2134x1853.png&quot;,&quot;uuid&quot;:&quot;44575a9f-9137-4f91-8fdb-101dee965464&quot;}" data-component-name="MentionToDOM"></span> expalins that semantic linking is the missing connection between data and meaning, where the real job is not adding labels to tables but explicitly mapping data objects to shared business concepts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Pjf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Pjf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 424w, https://substackcdn.com/image/fetch/$s_!8Pjf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 848w, https://substackcdn.com/image/fetch/$s_!8Pjf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 1272w, https://substackcdn.com/image/fetch/$s_!8Pjf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Pjf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp" width="886" height="726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:726,&quot;width&quot;:886,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19040,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8Pjf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 424w, https://substackcdn.com/image/fetch/$s_!8Pjf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 848w, https://substackcdn.com/image/fetch/$s_!8Pjf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 1272w, https://substackcdn.com/image/fetch/$s_!8Pjf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F325df036-0fca-4a3f-840e-64018410c7d2_886x726.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://vinvashishta.substack.com/p/ai-is-finally-eating-softwares-total">AI Is Finally Eating Software&#8217;s Total Market: Here&#8217;s What&#8217;s Next</a> (11 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Vin Vashishta&quot;,&quot;id&quot;:16324927,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/4b303796-0198-4e37-9ec4-016a2f12582d_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;8c53864b-2a1a-4c48-9e00-f4e65c96daee&quot;}" data-component-name="MentionToDOM"></span> argues that AI won&#8217;t just disrupt software products, it will collapse whole layers of software value unless companies control the workflow, the customer relationship or the data moat.</p></li><li><p><strong><a href="https://joeljang.github.io/world-models-for-robotics?utm_source=tldrai&amp;utm_medium=newsletter">World Models and the Data Problem in Robotics</a> (13 minute read)<br></strong>Nvidia researcher makes the case that robotics hits a data wall long before an algorithm wall and that world models learned from human first-person video are the most plausible route to scalable robot intelligence.</p></li><li><p><strong><a href="https://medium.com/whatnot-engineering/lessons-learned-from-scaling-data-scientists-with-ai-e7aa7b3235b4">Lessons learned from scaling data scientists with AI</a> (10 minute read)<br></strong>Whatnot&#8217;s lesson from deploying AI for data science is that LLMs don&#8217;t remove the need for data scientists, they force teams to get serious about semantic layers and production-grade context management.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NSuj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NSuj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 424w, https://substackcdn.com/image/fetch/$s_!NSuj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 848w, https://substackcdn.com/image/fetch/$s_!NSuj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 1272w, https://substackcdn.com/image/fetch/$s_!NSuj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NSuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp" width="720" height="458" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:458,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36948,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NSuj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 424w, https://substackcdn.com/image/fetch/$s_!NSuj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 848w, https://substackcdn.com/image/fetch/$s_!NSuj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 1272w, https://substackcdn.com/image/fetch/$s_!NSuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa809ae6c-3595-46e0-902e-0da2947f9c90_720x458.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents">How Shopify Scales Taxonomy Evolution Across 10,000+ Categories With Multi-Agent AI</a> (14 minute read)<br></strong>This piece breaks down how Shopify moved from reactive manual updates to a multi-agent system that scans taxonomy branches in parallel, proposes new categories/attributes from merchant data, detects duplicates and runs automated QA through domain-specific judges.</p></li></ul><div><hr></div><h3>Data engineering</h3><ul><li><p><strong><a href="https://dataengineeringcentral.substack.com/p/a-portable-analytics-stack">A Portable Analytics Stack</a> (13 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Yuki&quot;,&quot;id&quot;:89127157,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Y7d4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F026b3d67-d3cf-4b3f-b498-7dd16df31b1e_1874x1868.png&quot;,&quot;uuid&quot;:&quot;5bc04c30-baa1-4f7d-a445-8dcf94fd6dc8&quot;}" data-component-name="MentionToDOM"></span> shows how a portable analytics stack built on DuckDB, DuckLake, dlt and SQLMesh can replace warehouse-heavy setups with lightweight, version-controlled pipelines that run locally or on cheap scheduled compute.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mc7P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mc7P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 424w, https://substackcdn.com/image/fetch/$s_!Mc7P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 848w, https://substackcdn.com/image/fetch/$s_!Mc7P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 1272w, https://substackcdn.com/image/fetch/$s_!Mc7P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mc7P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp" width="1456" height="836" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:836,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:34544,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mc7P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 424w, https://substackcdn.com/image/fetch/$s_!Mc7P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 848w, https://substackcdn.com/image/fetch/$s_!Mc7P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 1272w, https://substackcdn.com/image/fetch/$s_!Mc7P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98af22fb-150d-440a-a6bf-fb25b14a2d2a_1456x836.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://ghostinthedata.info/posts/2026/2026-02-07-self-healing/">Healing Tables: When Day-by-Day Backfills Become a Slow-Motion Disaster</a> (24 minute read)<br></strong>Chris Hillman shows why incremental historical backfills corrupt dimensions and proposes a healing table pattern that separates change detection from period building so history can be rebuilt cleanly.</p></li><li><p><strong><a href="https://joereis.substack.com/p/2028-the-great-data-reckoning">2028 - THE GREAT DATA RECKONING</a> (16 minute read)<br></strong>A speculative but funny take by<strong> </strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Joe Reis&quot;,&quot;id&quot;:3531217,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e4716b1-c223-41e3-b943-def0291bf217_1175x783.jpeg&quot;,&quot;uuid&quot;:&quot;ac81ae97-b281-43c3-83f2-fa87aa69afe7&quot;}" data-component-name="MentionToDOM"></span> where he imagines a 2028 data industry shakeout where AI wipes out much of the tooling and data theater while the people who survive are the ones with real business context and architecture skills.</p></li><li><p><strong><a href="https://www.datagibberish.com/p/data-engineers-are-becoming-metadataops-engineers">Data Engineers Are Becoming MetadataOps Engineers</a> (10 minute read)</strong><br>An interesting take by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Alejandro Aboy&quot;,&quot;id&quot;:22949723,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!u1Ao!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca2c63d-9f5e-4cd3-99ac-7d8e71dc114b_1024x1024.jpeg&quot;,&quot;uuid&quot;:&quot;7bc527fa-74d9-439c-8b1c-47efa24317e3&quot;}" data-component-name="MentionToDOM"></span>  that the next layer of data engineering is MetadataOps: building AI-ready metadata, semantic structure and agent-facing governance so LLMs stop guessing and start using data reliably.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rA0I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rA0I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 424w, https://substackcdn.com/image/fetch/$s_!rA0I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 848w, https://substackcdn.com/image/fetch/$s_!rA0I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 1272w, https://substackcdn.com/image/fetch/$s_!rA0I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rA0I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp" width="1246" height="492" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:492,&quot;width&quot;:1246,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:29996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rA0I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 424w, https://substackcdn.com/image/fetch/$s_!rA0I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 848w, https://substackcdn.com/image/fetch/$s_!rA0I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 1272w, https://substackcdn.com/image/fetch/$s_!rA0I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8282242-7eaf-4270-8899-acb679ca05db_1246x492.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://pipeline2insights.substack.com/p/how-data-engineers-can-leverage-ai-tools-without-losing-fundamentals">How Data Engineers Can Leverage AI Tools Without Losing Fundamentals</a> (13 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Jenny Ouyang&quot;,&quot;id&quot;:282291554,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b27a5eb-a443-4738-b205-2a29d85f00b9_1068x1068.png&quot;,&quot;uuid&quot;:&quot;6c167fee-5ac6-4e3a-9da7-b49a6b406c6e&quot;}" data-component-name="MentionToDOM"></span> makes the case that data engineers should use AI to accelerate the boilerplate, not outsource the fundamentals because the real leverage still comes from owning modeling, architecture and performance</p></li><li><p><strong><a href="https://seattledataguy.substack.com/p/backfills-the-necessary-evil-of-data">Backfills - The Necessary Evil of Data Engineering</a> (12 minute read)<br></strong>A practical look by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;SeattleDataGuy&quot;,&quot;id&quot;:4963622,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec905aa-9a7b-4f21-b0ff-fec92e8916d1_512x512.jpeg&quot;,&quot;uuid&quot;:&quot;4dcbf161-9f88-42ae-9512-dc6d0ba9728d&quot;}" data-component-name="MentionToDOM"></span> at why backfills happen, why engineers hate them, and how better parameterization, rerunnability, and storage-aware design can make them less painful.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bbl6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bbl6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 424w, https://substackcdn.com/image/fetch/$s_!bbl6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 848w, https://substackcdn.com/image/fetch/$s_!bbl6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 1272w, https://substackcdn.com/image/fetch/$s_!bbl6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bbl6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp" width="800" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:88114,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bbl6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 424w, https://substackcdn.com/image/fetch/$s_!bbl6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 848w, https://substackcdn.com/image/fetch/$s_!bbl6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 1272w, https://substackcdn.com/image/fetch/$s_!bbl6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24264c10-ffaf-4e05-abbf-1ecb2ae8e30e_800x1000.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://luminousmen.substack.com/p/why-your-5-second-bigquery-query">Why Your 5-Second BigQuery Query Isn&#8217;t Cheap</a> (13 minute read)</strong><br>A practical breakdown of BigQuery pricing by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;luminousmen&quot;,&quot;id&quot;:29227863,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffead33a9-5e35-4522-b96e-c1a523419524_300x297.jpeg&quot;,&quot;uuid&quot;:&quot;8b5a60a7-c76a-40b4-9563-67e217b7c93f&quot;}" data-component-name="MentionToDOM"></span> that shows why short query runtimes are a misleading proxy for cost, and why slots are the compute metric that actually matters.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records">How LinkedIn Built a Pipeline That Scales to 230M Records/sec Without Breaking SLAs</a> (12 minute read)<br></strong>LinkedIn pushed Venice to handle 175M+ lookups per second while ingesting 230M writes per second. This piece breaks down how they balanced compaction, CPU bottlenecks and adaptive throttling to scale ingestion under eventual consistency.</p></li></ul><div><hr></div><h3>Data analysis and visualisation</h3><ul><li><p><strong><a href="https://mlcontests.com/state-of-machine-learning-competitions-2025/">The State of Machine Learning Competitions</a> (34 minute read)</strong><br>This report maps the 2025 competition landscape and finds that winning solutions are getting more compute-hungry, transformer-heavy and increasingly shaped by Qwen in NLP while classic tabular methods still hold their ground.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QrX6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QrX6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 424w, https://substackcdn.com/image/fetch/$s_!QrX6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 848w, https://substackcdn.com/image/fetch/$s_!QrX6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 1272w, https://substackcdn.com/image/fetch/$s_!QrX6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QrX6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png" width="976" height="465" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/172d9814-58ab-437c-9093-72f0429c57a1_976x465.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:465,&quot;width&quot;:976,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54108,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QrX6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 424w, https://substackcdn.com/image/fetch/$s_!QrX6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 848w, https://substackcdn.com/image/fetch/$s_!QrX6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 1272w, https://substackcdn.com/image/fetch/$s_!QrX6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F172d9814-58ab-437c-9093-72f0429c57a1_976x465.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul><div><hr></div><h3><strong>Other interesting reads</strong></h3><ul><li><p><strong><a href="https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies-it">AI Doesn&#8217;t Reduce Work - It Intensifies It</a> (9 minute read)<br></strong>Interesting findings in HBR that AI doesn&#8217;t really remove work so much as intensify it, speeding up expectations and raising the risk of burnout instead of delivering the productivity gains companies hoped for.</p></li><li><p><strong><a href="https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b">The Anthropic Hive Mind</a> (21 minute read)<br></strong>Steve Yegge&#8217;s take is that Anthropic&#8217;s real edge is not just Claude but a hive-mind way of working where humans and AI operate in a shared, high-speed loop that most companies aren&#8217;t built for yet.</p></li><li><p><strong><a href="https://epochai.substack.com/p/the-least-understood-driver-of-ai">The least understood driver of AI progress</a> (36 minute read)</strong><br>Anson Ho highlights that software progress, not just bigger chips or more spending, is a major and underappreciated reason AI keeps getting better faster than many people expect.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KwlD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KwlD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 424w, https://substackcdn.com/image/fetch/$s_!KwlD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 848w, https://substackcdn.com/image/fetch/$s_!KwlD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 1272w, https://substackcdn.com/image/fetch/$s_!KwlD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KwlD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp" width="1456" height="833" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:833,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37956,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/190247984?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KwlD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 424w, https://substackcdn.com/image/fetch/$s_!KwlD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 848w, https://substackcdn.com/image/fetch/$s_!KwlD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 1272w, https://substackcdn.com/image/fetch/$s_!KwlD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4c1544-719e-4bcb-8b02-760d97629847_1456x833.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3><strong>Quick favor - need your take</strong></h3><div class="poll-embed" data-attrs="{&quot;id&quot;:469745}" data-component-name="PollToDOM"></div><p><strong>Was there any standout article or topic from February I missed? Feel free to drop a comment or hit reply, even a quick line helps.</strong></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1b7b5909-6ab4-4a88-98a1-d09e96554f4d&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from January&#128071;<br /><br />&#119811;&#119834;&#119853;&#119834; &#119826;&#119836;&#119842;&#119838;&#119847;&#119836;&#119838; &amp;amp; &#119808;&#119816;<br />Why &#8216;use agents or be left behind&#8217; is mostly about practical automation<br />Piecewise regression for spotting regime shifts in time series<br />Why AI benchmarks are hitting a measurement wall<br />What the data actually says about the state of open models<br />How large-scale recommendation systems are built in the real world<br /><br />&#119811;&#119834;&#119853;&#119834; &#119812;&#119847;&#119840;&#119842;&#119847;&#119838;&#119838;&#119851;&#119842;&#119847;&#119840;<br />How Unity Catalog really works under the hood<br />Databricks Lakeflow vs Airflow in practice<br />End-to-end agentic data modeling with OpenMetadata<br />A candid look at the day-to-day reality of data engineering<br />How Uber cut data lake freshness from hours to minutes with Flink<br /><br />&#119811;&#119834;&#119853;&#119834; &#119808;&#119847;&#119834;&#119845;&#119858;&#119852;&#119842;&#119852; &amp;amp; &#119809;&#119816;<br />The best data visualization projects of 2025<br />Why storytelling matters more than chart tricks<br />Designing more accessible line charts<br />Practical rules for dashboard filter placement<br /><br />Plus: ontologies explained, hard lessons from building AI agents in finance and new data on who&#8217;s really buying AI compute.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in January 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-05T03:20:52.027Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8f4e23e-4e9e-4420-bbed-d16a4d242c7d_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-january-2026&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:186553359,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:16,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;6e0c97bb-8be5-42ce-a02a-36b05fdd232c&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from December&#128071;<br /><br />&#119811;&#119834;&#119853;&#119834; &#119826;&#119836;&#119842;&#119838;&#119847;&#119836;&#119838; &amp;amp; &#119808;&#119816;<br />The state of LLMs in 2025<br />Building a data cleaning agent with LangGraph<br />Making sense of memory in AI agents<br />Exploring TabPFN: a foundation model built for tabular data<br /><br />&#119811;&#119834;&#119853;&#119834; &#119812;&#119847;&#119840;&#119842;&#119847;&#119838;&#119838;&#119851;&#119842;&#119847;&#119840;<br />Opinionated data platforms vs. open-source<br />Data quality design patterns<br />LLM for PDF data pipelines<br />DuckDB: the Swiss army knife for data engineers<br /><br />&#119811;&#119834;&#119853;&#119834; &#119808;&#119847;&#119834;&#119845;&#119858;&#119852;&#119842;&#119852; &amp;amp; &#119809;&#119816;<br />A comprehensive guide to data visualization<br />Broken charts and 9 visualization alternatives<br /><br />Plus: The most useful skill to learn as a data professional, predictions about AI in 2026 and the next data bottleneck&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in December 2025&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-08T05:01:52.132Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29125fa4-9a37-40a2-a85c-c795fb77137f_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-december-2025&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:183495145,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:16,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Shopify Scales Taxonomy Evolution Across 10,000+ Categories With Multi-Agent AI]]></title><description><![CDATA[From reactive manual curation to continuous taxonomy evolution grounded in merchant reality.]]></description><link>https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 26 Feb 2026 04:00:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tUAj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Shopify scales its product categorisation using agentic AI</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jEOH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jEOH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!jEOH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!jEOH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!jEOH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jEOH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/188769392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jEOH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!jEOH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!jEOH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!jEOH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6544e5f1-6242-45c5-b4e9-0fda20c0d106_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on)  provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Shopify&#8217;s multi-agent taxonomy</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tUAj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tUAj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!tUAj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!tUAj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!tUAj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tUAj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72968,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/188769392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tUAj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!tUAj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!tUAj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!tUAj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10490256-5538-403a-bc50-b153a36a9b6f_1536x1024.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">(Source: Shopify)</figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Shopify&#8217;s product classification system makes tens of millions of predictions daily, across a taxonomy with 10,000+ categories and 2,000+ attributes. Commerce changes fast, the taxonomy has to keep up or the whole stack starts drifting.</p><h4><strong>Task</strong></h4><p>Keep the taxonomy current at scale without relying on slow, reactive, manual curation. Fix volume, expertise and consistency problems before they hit merchants, customers and model quality.</p><h4><strong>Action</strong></h4><p>Built an AI multi-agent system: structural analysis + product-driven analysis, then intelligent synthesis. Added equivalence detection (category = broader category + attribute filters) plus automated QA via domain-specific AI judges.</p><h4><strong>Result</strong></h4><p>Taxonomy branches can be analyzed in parallel: hundreds of categories instead of a few per day. Quality improved via grounded merchant data + structural consistency, with judges filtering proposals (example: &#8220;MagSafe compatible&#8221; approved at 93% confidence).</p><h4><strong>Use Cases</strong></h4><p>Category discovery, attribute gap detection, taxonomy maintenance, search and filtering improvement</p><h4><strong>Tech Stack/Framework</strong></h4><p>AI agent, equivalence detection, multi-agent system</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Context</h4><p>Last year, over 875 million people bought items from Shopify merchants. Shopify already runs a product classification system that makes tens of millions of predictions daily with a high degree of accuracy.</p><p>But classification is the easy part compared to the thing underneath it: taxonomy. Because the model doesn&#8217;t just need to be right, it also needs a clean, consistent set of labels to be right <em>about</em>.</p><p>That&#8217;s the challenge for Shopify: once you have 10,000+ categories and 2,000+ attributes, the taxonomy becomes its own product with its own failure modes. It can get stale. It can get inconsistent. It can drift away from how merchants actually describe products. And when that happens, the classifier quality takes the blame for what is basically a taxonomy debt problem.</p><p>So this post is about what Shopify did next: they built an AI multi-agent system that doesn&#8217;t just classify products, it actively improves the taxonomy labels themselves so the system stays agile as commerce changes.</p><div><hr></div><h4>The challenge: scaling taxonomy without losing accuracy</h4><p>A taxonomy is a contract between three groups that rarely agree:</p><ul><li><p>Merchants describing products the way they think about them</p></li><li><p>Customers searching and filtering with their own mental model</p></li><li><p>Platform systems trying to enforce structure so everything stays queryable and comparable</p></li></ul><p>Now add the reality that commerce never sits still. New products appear. Old categories split. Entire verticals get reshaped by trends, tech and regulation. The taxonomy has to keep up or the platform drifts away from how people actually shop and sell.</p><p>Shopify frames the challenge as three problems.</p><p><strong>The volume problem: manual updates can&#8217;t keep up</strong></p><p>A global product taxonomy needs constant attention. Every new product type, emerging technology category and seasonal trend potentially triggers taxonomy updates. </p><p>Manual curation becomes a bottleneck because taxonomy work is not one change. It is usually a bundle: a category addition, a hierarchy decision, a set of attributes, naming alignment and a check for duplicates or conflicts.</p><p>For example, consider the emergence of categories like smart home devices or remote work equipment. Each category represents not just new categories but also entirely new attribute sets.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u4Rg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u4Rg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 424w, https://substackcdn.com/image/fetch/$s_!u4Rg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 848w, https://substackcdn.com/image/fetch/$s_!u4Rg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 1272w, https://substackcdn.com/image/fetch/$s_!u4Rg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u4Rg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp" width="1456" height="497" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:497,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42484,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/188769392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!u4Rg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 424w, https://substackcdn.com/image/fetch/$s_!u4Rg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 848w, https://substackcdn.com/image/fetch/$s_!u4Rg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 1272w, https://substackcdn.com/image/fetch/$s_!u4Rg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08494fd3-dfe6-4019-acce-a25af7e18b77_2462x841.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A new category example (Source: Shopify)</figcaption></figure></div><p>Smart home devices for instance need connectivity types, power requirements and compatibility. Those are specs that did not exist in the taxonomy before.</p><p>So the work isn&#8217;t a one-off. It&#8217;s continuous expansion and adjustment across a giant tree of concepts.</p><p><strong>The expertise problem: every vertical has its own rules</strong></p><p>Good taxonomy design is domain-heavy. You do not get it right by being generally smart. You get it right by knowing what matters in that product world. For example, there are nuanced differences between types of guitar pickups or appropriate attributes for skincare products.</p><p>A taxonomy team can&#8217;t realistically maintain deep expertise across every vertical that merchants sell into. But if the taxonomy is inconsistent or poorly structured, merchants pay for it through reduced discoverability, suboptimal search results and ineffective filters for customers.</p><p><strong>The consistency problem: one concept, five different labels</strong></p><p>As the taxonomy grows organically, inconsistencies creep in:</p><ul><li><p>similar concepts represented differently across categories</p></li><li><p>naming conventions inconsistent</p></li><li><p>discrepancies between merchant categorization and customer expectations</p></li></ul><p>Those inconsistencies compound. Merchants get confused when listing. Customers get frustrated when filtering and comparing. And the classifier quality drops because labels stop being reliably meaningful across the tree.</p><p>This is the part most teams underestimate. In a taxonomy, small inconsistencies behave like small data quality issues: they don&#8217;t stay small.</p><div><hr></div><h4>From manual taxonomy work to agent-led evolution</h4><p>Shopify&#8217;s taxonomy management evolved from a manual workflow into an AI-driven system.</p><p><strong>The old way: Expert review, slow throughput</strong></p><p>The traditional pattern is familiar:</p><ol><li><p>domain experts analyze product data</p></li><li><p>identify gaps or inconsistencies</p></li><li><p>propose changes</p></li><li><p>implement changes via careful review</p></li></ol><p>It ensures quality but it also creates bottlenecks.</p><p>The biggest problem was the reactive nature of it: Shopify would only recognize the need for new categories or attributes <em><strong>after</strong></em> merchants began listing products that didn&#8217;t fit. By then, the system had already missed chances to give merchants and customers a better experience.</p><p>So even when you do great manual work, you&#8217;re always late.</p><p><strong>The breakthrough: Two lenses, one system</strong></p><p>Advanced language models opened a door: not to replace human experts, but to augment them with scale and consistency.</p><p>The key insight was that taxonomy improvement comes from two different angles:</p><ul><li><p><strong>structural analysis</strong>: the logical structure of the taxonomy, gaps in hierarchies, missing relationships</p></li><li><p><strong>product-driven analysis</strong>: what real product data says merchants actually sell and how they describe it</p></li></ul><p>Each angle catches different issues. Shopify&#8217;s breakthrough was combining them into a system that can continuously propose improvements then filter them through quality checks before human review.</p><div><hr></div><h4>Inside the system: How the agents work</h4><p>The new architecture rests on three principles:</p><ul><li><p>specialized analysis</p></li><li><p>intelligent coordination</p></li><li><p>quality assurance</p></li></ul><p>And the intent is clear: continuous evolution, not one-time taxonomy construction.</p><p><strong>What&#8217;s different: continuous evolution, not one-time creation</strong></p><p>AI&#8217;s been used for product categorisation and one-off taxonomy builds for a while. The difference here is instead of building it once and hoping it holds, Shopify uses specialised AI agents to keep the taxonomy evolving continuously. There are 3 core components to this approach:</p><p><strong>1- Real product grounding: </strong>The system integrates actual merchant product data so proposals reflect how merchants describe and categorize products. This keeps decisions grounded in commerce reality rather than only theory.</p><p>In other words: if merchants are consistently describing a differentiator, it probably belongs in the taxonomy, even if it offends someone&#8217;s idea of a &#8220;pure&#8221; category tree.</p><p><strong>2- Multi-agent specialization: </strong>Multiple specialized agents run different analyses. One focuses on structural consistency. Another focuses on product-driven insights. Then those outputs are synthesized. The claim here is that the combination finds improvements that neither agent would find alone.</p><p>That makes sense structurally. Taxonomy is both a graph problem and a language problem.</p><p><strong>3- Sophisticated equivalence discovery: </strong>This is the most interesting component. detecting equivalence relationships where a specific category equals a broader category filtered by attribute values.</p><p>This matters because merchants should be able to organize their catalogs however they want, while the platform still understands what products &#8216;mean&#8217; underneath the merchant&#8217;s choices.</p><p>So instead of forcing everyone into one rigid structure, Shopify tries to learn mappings that preserve flexibility and still support search, recommendations, and analytics.</p><p><strong>Architecture flow</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jtG3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jtG3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 424w, https://substackcdn.com/image/fetch/$s_!jtG3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 848w, https://substackcdn.com/image/fetch/$s_!jtG3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 1272w, https://substackcdn.com/image/fetch/$s_!jtG3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jtG3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp" width="470" height="840" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:840,&quot;width&quot;:470,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23704,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/188769392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jtG3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 424w, https://substackcdn.com/image/fetch/$s_!jtG3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 848w, https://substackcdn.com/image/fetch/$s_!jtG3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 1272w, https://substackcdn.com/image/fetch/$s_!jtG3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e057eb-1a81-48b4-be6f-6fc481b3a4c1_470x840.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">AI Agent architecture flow (Source: Shopify)</figcaption></figure></div><p>The AI agent workflow works like this:</p><ul><li><p>enable agents to explore the taxonomy</p></li><li><p>run multi-stage analysis (structural + product-driven)</p></li><li><p>synthesize and resolve conflicts</p></li><li><p>detect equivalences</p></li><li><p>run automated QA using judges</p></li><li><p>send refined proposals to humans</p></li><li><p>update the taxonomy in production</p></li></ul><div><hr></div><h4>Enabling agent-taxonomy interaction</h4><p>Before agents can improve anything, they need to &#8216;read&#8217; the taxonomy like a human would.</p><p>Shopify implemented a system that allows agents to:</p><ul><li><p>search for related categories</p></li><li><p>examine hierarchical relationships</p></li><li><p>verify whether proposed changes conflict with existing elements</p></li></ul><p>A good example: an agent analyzing guitar-related categories can explore the full musical instruments hierarchy, inspect related attributes across instruments and look for patterns that suggest better structure.</p><p>In other words, the agent doesn&#8217;t just look at one node. It roams the neighborhood.</p><div><hr></div><h4>The pipeline: specialised agents, staged decisions</h4><p>For the AI Agent to be work properly, different specialised agents are at work to provide specific insights:</p><p><strong>Structural analysis: </strong>This agent looks at the taxonomy itself for logical consistency, completeness, gaps in category hierarchies, naming convention inconsistencies and opportunities to reorganize related concepts.</p><p>It operates purely on the taxonomy structure and aims to keep the whole thing coherent.</p><p><strong>Product-driven analysis: </strong>This agent integrates real merchant data and examines how products are described and categorized on the platform.</p><p>Specifically, it looks at patterns in product titles, product descriptions and merchant-defined categories. The goal is to find gaps between how merchants think about products and how the taxonomy represents them.</p><p>This is an important distinction. A taxonomy can be structurally perfect and still be useless if it doesn&#8217;t match merchant reality.</p><p><strong>Intelligent synthesis: </strong>Now we have two streams of recommendations:</p><ul><li><p>structure-driven improvements</p></li><li><p>product-driven improvements</p></li></ul><p>They can conflict. They can overlap. They can propose redundant changes.</p><p>The synthesis step merges insights, resolves conflicts, and eliminates redundancies. And sometimes the best answer is not pick one, it&#8217;s combine both.</p><p><strong>Equivalence detection: </strong>This agent solves a practical commerce problem: merchants want flexibility but platform systems need consistency.</p><p>Consider golf shoes:</p><ul><li><p>Merchant A uses a specific &#8216;Golf Shoes&#8217; category</p></li><li><p>Merchant B uses &#8216;Athletic Shoes&#8217; with an &#8216;Activity Type = Golf attribute</p></li></ul><p>Both are valid for the merchant. But search, recommendations and analytics benefit from understanding these represent the same product set.</p><p>So the system detects attribute-based equivalences of the form:</p><blockquote><p>specific category = broader category + one or more attribute filters</p></blockquote><p>This lets merchants organize however makes sense for their business while keeping platform intelligence consistent across different catalog structures.</p><p>If you&#8217;ve ever tried to do cross-merchant analytics at scale, you can probably feel why Shopify cared enough to build an entire agent for this.</p><div><hr></div><h4>Automated QA: judges before humans</h4><p>After proposals are generated, Shopify adds automated QA through specialized AI judges.</p><p>These judges evaluate proposed changes using reasoning capabilities and taxonomy design principles to filter and refine suggestions before human review.</p><p>The important detail is that evaluation differs by change type:</p><ul><li><p>adding new attributes</p></li><li><p>creating category hierarchies</p></li><li><p>modifying existing structures</p></li></ul><p>Different changes require different criteria, so one generic &#8216;judge prompt&#8217; would be weak. So instead, they use <strong>domain-specific judges</strong>.</p><p>An electronics-focused judge applies electronics expertise. A musical instruments judge applies that domain&#8217;s patterns and rules. The goal is consistent domain-aware evaluation across verticals.</p><div><hr></div><h3>Results</h3><p>The system can analyze taxonomy branches in parallel, identifying improvement opportunities that used to take weeks of manual work.</p><p>Where experts might analyze a few categories per day, the system can evaluate hundreds of categories, checking both:</p><ul><li><p>structural consistency</p></li><li><p>alignment with real product data</p></li></ul><p>This matters most for emerging product categories. When new product types become popular on the platform, the system can quickly identify taxonomy gaps and propose comprehensive solutions, instead of reactive patches that build up debt.</p><p><strong>Quality improvements</strong></p><p>The multi-agent design improves consistency and comprehensiveness because it combines two lenses:</p><ul><li><p>structural analysis keeps hierarchy organization logical and consistent</p></li><li><p>product-driven analysis keeps categories and attributes aligned with merchant reality</p></li></ul><p>The automated QA layer reduces iteration cycles by catching issues before human review and applying domain expertise consistently.</p><p><strong>Example: mobile phone accessories and MagSafe compatibility</strong></p><p>Product analysis identified that merchants frequently advertise &#8220;MagSafe support&#8221; for accessories such as chargers, cases and wallets.</p><p>So the agent proposed adding a boolean attribute: &#8216;MagSafe compatible.&#8217;</p><p>A specialized electronics judge evaluated the proposal and checked:</p><ul><li><p>no duplicate attribute already exists</p></li><li><p>boolean type is appropriate</p></li><li><p>while brand-specific, MagSafe is treated as a legitimate technical standard similar to Bluetooth or Qi</p></li></ul><p>The judge approved the attribute with <strong>93% confidence</strong>, noting it would improve customer filtering for MagSafe-ready products.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M4Uu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M4Uu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 424w, https://substackcdn.com/image/fetch/$s_!M4Uu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 848w, https://substackcdn.com/image/fetch/$s_!M4Uu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 1272w, https://substackcdn.com/image/fetch/$s_!M4Uu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M4Uu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp" width="1456" height="1038" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1038,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:215182,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/188769392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M4Uu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 424w, https://substackcdn.com/image/fetch/$s_!M4Uu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 848w, https://substackcdn.com/image/fetch/$s_!M4Uu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 1272w, https://substackcdn.com/image/fetch/$s_!M4Uu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e5da6d9-d490-4773-a9a5-77b2d8b2166d_2048x1460.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">MagSafe example (Source: Shopify)</figcaption></figure></div><p>This example matters because it demonstrates the full loop:</p><ul><li><p>merchant reality creates a signal</p></li><li><p>the agent proposes a structured change</p></li><li><p>a domain judge validates it with rule checks and domain framing</p></li><li><p>humans get a higher quality proposal to review</p></li></ul><p><strong>Scaling development: from reactive fixes to proactive evolution</strong></p><p>The biggest shift is strategic: taxonomy development becomes proactive, not reactive.</p><p>Instead of waiting for a merchant pain point or a platform limitation to trigger a change, the system can identify and address gaps earlier.</p><p>The system can also reason over the entire taxonomy structure, which supports cross-category consistency. That helps avoid the fragmentation you get when teams fix issues in isolation.</p><p>To validate the approach, they applied it to a specific area: <strong>Electronics &gt; Communications &gt; Telephony</strong> (called &#8220;Telephony AI&#8221; in their analysis) and compared it against their previous manual expansion method.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3I-O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3I-O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 424w, https://substackcdn.com/image/fetch/$s_!3I-O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 848w, https://substackcdn.com/image/fetch/$s_!3I-O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 1272w, https://substackcdn.com/image/fetch/$s_!3I-O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3I-O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp" width="1456" height="1088" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1088,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:103104,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/188769392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3I-O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 424w, https://substackcdn.com/image/fetch/$s_!3I-O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 848w, https://substackcdn.com/image/fetch/$s_!3I-O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 1272w, https://substackcdn.com/image/fetch/$s_!3I-O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ae93987-6ddd-4093-891c-44bed1b0a9ff_1558x1164.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">AI Agent impact (Source: Shopify)</figcaption></figure></div><p>As you can see from the chart, the AI-assisted method can compress years of work into weeks for the taxonomy area if the agents are applied across all verticals.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://shopify.engineering/product-taxonomy-at-scale">Shopify's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-shopify-scales-taxonomy-evolution-across-10000-categories-with-ai-agents?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e78745a9-87b3-4842-91b1-c47c28b3e197&quot;,&quot;caption&quot;:&quot;Production ML isn&#8217;t only about clever architectures. It&#8217;s about judgment, trade-offs and systems that hold up when data is messy.<br /><br />I sat down with Ahsaas Bajaj , Senior ML Engineer at Instacart, to talk about how they handle product substitutions at scale, what actually moves business metrics and what changes when you move into a senior ML role.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to Build a Recommendation System at Scale: Insights from Instacart&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:175610076,&quot;name&quot;:&quot;Ahsaas Bajaj&quot;,&quot;bio&quot;:&quot;Senior Machine Learning Engineer II at Instacart&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/34dac958-9c70-4f48-89ed-6c2e0d6f197e_899x901.png&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://bajajahsaas.substack.com/subscribe?&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://bajajahsaas.substack.com&quot;,&quot;primaryPublicationName&quot;:&quot;Ahsaas Bajaj&quot;,&quot;primaryPublicationId&quot;:7296320}],&quot;post_date&quot;:&quot;2026-01-29T03:30:24.563Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e6c5924-ed6c-4998-8e4a-8f88d9102c8b_844x473.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-to-build-a-recommendation-system-at-scale&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:181648418,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:12,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;85d1fa84-549b-4cc1-b9b2-ea55a5e0b6fb&quot;,&quot;caption&quot;:&quot;DoorDash built an anomaly detection platform to catch fraud trends before they result into huge top-line losses.<br /><br />This piece breaks down how they scan hundreds of millions of overlapping segments each day, cut fraud detection time from 100+ days to under three and save tens of millions annually by finding small signals while they still look like noise.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How DoorDash Saves Tens of Millions of Dollars Per Year by Detecting Fraud 30&#215; Faster&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-23T05:56:24.141Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-doordash-saves-tens-of-millions-a-year-by-detecting-fraud&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:185495640,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How LinkedIn Built a Pipeline That Scales to 230M Records/sec Without Breaking SLAs]]></title><description><![CDATA[From partition strategy to adaptive throttling, the playbook behind Venice&#8217;s ingestion evolution.]]></description><link>https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 19 Feb 2026 04:00:52 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today we will look at how LinkedIn ingests data at scale.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y5YD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y5YD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y5YD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y5YD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!y5YD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42e71ce5-0a12-4514-b7a0-ddfd0460289a_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Venice: LinkedIn&#8217;s data storage platform</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="5184" height="3456" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3456,&quot;width&quot;:5184,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a computer screen with a facebook page on it&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a computer screen with a facebook page on it" title="a computer screen with a facebook page on it" srcset="https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1712217559097-cc2aaf698767?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxM3x8bGlua2VkaW58ZW58MHx8fHwxNzcxMTMwNjQ1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@getswello">Swello</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Venice powers LinkedIn&#8217;s AI-driven products and has scaled to 2,600+ stores with workloads spanning bulk loads, streaming updates and active/active replication. The ingestion pipeline had to handle throughput-heavy, CPU-heavy and latency-sensitive traffic under eventual consistency.</p><h4><strong>Task</strong></h4><p>Redesign ingestion to scale to 230M writes/sec while preserving ordering and protecting read and write SLAs. Support hybrid stores, partial updates and multi&#8211;data center replication without destabilizing clusters.</p><h4><strong>Action</strong></h4><p>Scaled bulk ingestion with partition tuning, shared consumer/writer pools and direct SST writes; tuned RocksDB via compaction triggers and BlobDB to manage amplification. Optimized CPU-heavy paths using Fast-Avro and parallel processing, then enforced priority pools and adaptive throttling to protect current-version latency.</p><h4><strong>Result</strong></h4><p>Venice now handles 175M+ key lookups/sec and 230M+ writes/sec in production. It maintains a write latency SLA under 10 minutes while safeguarding read latency as the top priority.</p><h4><strong>Use Cases</strong></h4><p>Large-scale feature stores, real-time recommendation systems, hybrid data serving, low-latency notification</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Spark, Apache Samza, Apache Kafka, RocksDB, Fast-Avro, Adaptive Throttling</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Background</h4><p><a href="https://github.com/linkedin/venice">Venice</a> is an open-source derived data storage platform and LinkedIn&#8217;s default storage layer for online AI use cases. It sits behind products like People You May Know, feed, videos, ads, notifications, the A/B testing platform, LinkedIn Learning and more.</p><p>Since Venice launched internally in 2016 it has scaled from a handful of stores to over 2,600 production stores. The workloads also evolved a lot. It started with &#8220;just bulk load a dataset&#8221; and grew into a mix of:</p><ul><li><p>Bulk loading huge offline datasets</p></li><li><p>Nearline streaming updates</p></li><li><p>Active/active replication across data centers</p></li><li><p>Partial updates that merge fields and collections</p></li><li><p>Deterministic write latency expectations under eventual consistency</p></li></ul><p>This post walks through how the ingestion pipeline was revamped to hit <strong>230 million records per second in production</strong>, what changed across the architecture, which optimizations moved the needle and how different workload types get tuned. A lot of these ideas are portable if you run any distributed ingestion system where ordering, throughput and predictable latency all matter at once.</p><div><hr></div><h4>Venice overall ingestion pipeline</h4><p>At a high level, store owners write to Venice through three paths:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BTop!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BTop!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 424w, https://substackcdn.com/image/fetch/$s_!BTop!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 848w, https://substackcdn.com/image/fetch/$s_!BTop!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 1272w, https://substackcdn.com/image/fetch/$s_!BTop!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BTop!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png" width="600" height="297" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:297,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:27952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BTop!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 424w, https://substackcdn.com/image/fetch/$s_!BTop!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 848w, https://substackcdn.com/image/fetch/$s_!BTop!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 1272w, https://substackcdn.com/image/fetch/$s_!BTop!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda1d3a94-d249-4fcc-82c0-50950cf13705_600x297.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Venice overall ingestion pipeline (Source: LinkedIn)</figcaption></figure></div><ol><li><p><strong>Bulk loads</strong> from an offline processing platform (example: Spark)</p></li><li><p><strong>Nearline writes</strong> from a streaming processing platform (example: Samza)</p></li><li><p><strong>Direct writes</strong> from online applications</p></li></ol><p>No matter which path you take, the writes all pass through an intermediate PubSub broker layer. From there, the Venice Storage Node (VSN) consumes messages and persists data locally using RocksDB (an embedded key-value store).</p><p>The pipeline sounds straightforward until you operate it at scale. The same ingestion path has to support very different workloads. Some are throughput-driven (bootstrapping a massive store). Some are latency-driven (current-version updates). Some are CPU-heavy (partial updates and conflict resolution). Some are I/O-heavy (compaction, SST churn).</p><p>The following sections will look at the challenges and how the LinkedIn team resolved them.</p><div><hr></div><h4>Use case 1: bootstrapping from offline dataset</h4><p>Venice users can run bulk load jobs using offline processing platforms such as Spark to push new data versions to Venice stores. The hard part is performance for large or massive stores. If you want to find bottlenecks you need to understand the ingestion path end to end.</p><p><strong>What happens during a bulk load</strong></p><ul><li><p>A Venice Push Job (VPJ) creates a new version topic for the new store version, split into multiple partitions</p></li><li><p>The Spark job uses a map-reduce framework to produce messages to that version topic</p></li><li><p>It keeps one reducer per topic partition so message ordering is preserved</p></li><li><p>On the other side, the VSN spins up consumers, reads messages and persists them into RocksDB</p></li><li><p>There is one RocksDB instance per topic partition</p></li></ul><p>So you can hit bottlenecks in three obvious places:</p><ol><li><p>producing</p></li><li><p>consuming</p></li><li><p>persisting</p></li></ol><p>Production experience says you will hit all three, just not on the same day.</p><p><strong>Improving producing and consuming throughput</strong></p><p>The usual first lever is increasing the number of partitions for large stores so you can use more of the PubSub cluster capacity. More partitions tends to mean more parallelism and more throughput.</p><p>But it comes with trade-offs:</p><ul><li><p>more partitions means more management overhead across Venice and PubSub</p></li><li><p>there is a throughput ceiling per PubSub broker</p></li></ul><p>So partition count is not a free lunch. It&#8217;s a knob that buys you throughput and charges you complexity.</p><p><strong>Enhancing consumption scalability</strong></p><p>To keep up with production, VSN uses shared consumer pools across all hosted stores.</p><p>Instead of &#8220;one store version, one set of consumers,&#8221; each store version can use multiple consumers by distributing hosted partitions among them. The point is to keep multiple connections per PubSub broker to speed up consumption (similar to a <a href="https://en.wikipedia.org/wiki/Download_manager">Download Manager</a>).</p><p>The pool approach also does something boring but important: it sets an upper limit on total consumers which puts a ceiling on cost.</p><p><strong>Optimizing I/O performance</strong></p><p>VSN uses a shared writer pool to persist changes concurrently across multiple RocksDB instances and use local SSD capacity effectively.</p><p>Ordering is critical in Venice so for any given RocksDB instance there is only one writer actively writing to it. You still get concurrency across instances, not inside one instance which is the compromise that keeps ordering intact.</p><p><strong>Minimizing memory overhead</strong></p><p>Because messages for a partition are strictly ordered (thanks to the map-reduce framework), Venice uses <a href="https://github.com/facebook/rocksdb/wiki/creating-and-ingesting-sst-files">RocksDB&#8217;s SSTFileWriter</a> to generate SST files directly. That significantly reduces memory overhead during ingestion.</p><p><strong>Ingestion workflow in Venice Server</strong></p><p>Put together, the optimized workflow is basically: use the PubSub layer for distribution, use consumer pools for scalable reads, use writer pools for SSD throughput, preserve ordering by design and avoid memory blowups by writing SST files directly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pbHX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pbHX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 424w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 848w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 1272w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pbHX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png" width="1200" height="944" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:944,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:191738,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pbHX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 424w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 848w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 1272w, https://substackcdn.com/image/fetch/$s_!pbHX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb8e50b-527c-4d5b-81fe-4a43bb0f3bc1_1200x944.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Optimised Venice pipeline (Source: LinkedIn)</figcaption></figure></div><div><hr></div><h4>Use case 2: hybrid store</h4><p>Venice supports Lambda architecture style use cases by merging updates from both <strong>bulk loads</strong> and <strong>nearline writes</strong>. Users query a single store and get a unified view.</p><p><strong>Venice hybrid store workflow</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BaZo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BaZo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 424w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 848w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 1272w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BaZo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png" width="1024" height="375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:375,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:64710,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BaZo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 424w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 848w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 1272w, https://substackcdn.com/image/fetch/$s_!BaZo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94bfa8e8-d929-494f-ad6b-b512ede167aa_1024x375.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Hybrid store workflow (Source: LinkedIn)</figcaption></figure></div><p>How it works:</p><ul><li><p>each bulk load creates a new store version</p></li><li><p>that version has a new Kafka topic and a new database instance</p></li><li><p>real-time updates produced by a Samza job via a real-time topic are appended to both version topics to keep them current</p></li><li><p>once the new version catches up fully, it is swapped in as the active version to serve reads</p></li></ul><p>The hybrid store is important because it gives you a clean &#8220;new version build&#8221; story without losing real-time freshness. But it creates a new challenge: the database transitions from <strong>read-only</strong> to <strong>read-write</strong>.</p><p>That&#8217;s where <a href="https://github.com/facebook/rocksdb/wiki">RocksDB</a> tuning matters, because duplicates start showing up more often. Keys get updated or deleted after they were inserted. RocksDB uses <a href="https://github.com/facebook/rocksdb/wiki/Compaction">log compaction</a> to remove stale entries, but that compaction has overhead: scan, merge, rewrite SST files, consume CPU, I/O and disk.</p><p>So the core problem becomes: tune RocksDB so you can balance <a href="https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#amplification-factors">three competing types of pain.</a></p><ul><li><p><strong>Write amplification</strong>: bytes written to storage vs bytes written to the DB</p></li><li><p><strong>Read amplification</strong>: number of disk reads per query</p></li><li><p><strong>Space amplification</strong>: size of DB files on disk vs the actual data size</p></li></ul><p>Venice uses <a href="https://github.com/facebook/rocksdb/wiki/Leveled-Compaction">leveled compaction</a> by default and relies primarily on two methods to balance those trade-offs.</p><p><strong>1. Tuning the compaction trigger</strong></p><p>The key setting here is:</p><ul><li><p><strong>level0_file_num_compaction_trigger</strong></p></li></ul><p>This controls the max number of files allowed in Level-0. Once you exceed it, compaction kicks in to push SST files from Level-0 to Level-1 and onward as upper levels fill.</p><p>Why it matters:</p><ul><li><p>higher threshold &#8594; fewer compactions &#8594; lower write amplification</p></li><li><p>but also more Level-0 files &#8594; higher read amplification since reads may need to scan multiple files</p></li><li><p>plus higher space amplification because duplicates hang around longer</p></li></ul><p>Venice tunes this per cluster because clusters have different bottlenecks:</p><ul><li><p><strong>memory-serving clusters</strong> want data in RAM to speed up lookups. Memory is the limiting resource, so they set a <strong>lower threshold</strong> to reduce space amplification</p></li><li><p><strong>disk-serving clusters</strong> are often limited by disk I/O, so they set a <strong>higher threshold</strong> to reduce compaction frequency and lower disk write rate</p></li></ul><p>This is a practical tuning philosophy: tune to your real bottleneck, not a generic best practice.</p><p><strong>2. RocksDB BlobDB integration</strong></p><p><a href="https://github.com/facebook/rocksdb/wiki/BlobDB">BlobDB</a> is aimed at large-value workloads through key-value separation:</p><ul><li><p>Large values go into blob files</p></li><li><p>LSM tree stores small pointers</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DT0h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DT0h!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 424w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 848w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 1272w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DT0h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png" width="1200" height="447" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:447,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:156086,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DT0h!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 424w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 848w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 1272w, https://substackcdn.com/image/fetch/$s_!DT0h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d11f76-4d71-47dd-a41a-37f0bc48666a_1200x447.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">RocksDB BlobDB structure</figcaption></figure></div><p>This avoids copying large values repeatedly during compaction, reducing write amplification. The cost is additional space amplification because blobs can become unreferenced and require garbage collection.</p><p>For Venice, BlobDB integration reduced write amplification significantly in multi-tenant clusters, especially for large-value use cases. The reported impact here is big: <strong>more than a 50% reduction of disk write throughput</strong>. That matters because it avoided scaling out clusters when CPU and storage space were still available.</p><p>The win here is: you stop paying the compaction tax over and over on the same large payloads.</p><div><hr></div><h4>Use case 3: Active/active replication with partial update</h4><p>Venice guarantees eventual consistency, not strong consistency. That matters because it means you cannot just do read-modify-write operations directly due to write delays.</p><p>To handle this, Venice introduces <strong>partial update</strong>, a specialized operation that supports field-level updates and collection merges.</p><p><strong>Venice partial update workflow</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ay5v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ay5v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 424w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 848w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 1272w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ay5v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png" width="840" height="1320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1320,&quot;width&quot;:840,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:279575,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ay5v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 424w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 848w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 1272w, https://substackcdn.com/image/fetch/$s_!ay5v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64784492-8dfb-479c-8521-7b2ce0f62c5d_840x1320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Venice partial update (Source: LinkedIn)</figcaption></figure></div><p>Inside the Venice server, the leader replica:</p><ul><li><p>decodes the incoming payload</p></li><li><p>applies the update</p></li><li><p>re-encodes the result</p></li><li><p>writes to the local database</p></li><li><p>writes to the Version Topic</p></li><li><p>follower replicas consume the merged results</p></li></ul><p>Most of that is CPU-heavy.</p><p>Then the platform evolved further with active/active replication across multiple data centers. The key mechanism is deterministic conflict resolution (DCR), similar to CRDTs. Venice tracks update timestamps at row and field levels, compares incoming timestamps with existing ones and decides to apply or skip.</p><p><strong>Venice Active/Active workflow</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!36Hk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!36Hk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 424w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 848w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 1272w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!36Hk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png" width="1024" height="1516" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1516,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:510735,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/187999868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!36Hk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 424w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 848w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 1272w, https://substackcdn.com/image/fetch/$s_!36Hk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F285f3b43-13b3-4437-b6fd-72984728e60b_1024x1516.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Venice Active/Active workflow (Source: LinkedIn)</figcaption></figure></div><p>Now the leader replica has even more to do for DCR:</p><ul><li><p>timestamp metadata lookup</p></li><li><p>decoding</p></li><li><p>encoding</p></li></ul><p>Again: CPU heavy. So the optimisation below focus on CPU efficiency.</p><p><strong>1. Fast-Avro adoption</strong></p><p><a href="https://github.com/linkedin/avro-util">Fast-Avro</a> was originally developed by RTBHouse but LinkedIn took over maintenance under the LinkedIn namespace and introduced many optimizations.</p><p>The key idea: Fast-Avro is an alternative to Apache Avro serialization and deserialization using runtime code generation which performs significantly better than the native implementation. It supports multiple Avro versions at runtime and is widely adopted inside LinkedIn.</p><p>Venice fully integrated Fast-Avro and saw, in one major use case, up to a <strong>90% improvement in deserialization latency at p99</strong> on the application side.</p><p><strong>2. Parallel processing</strong></p><p>In the traditional pipeline, DCR and partial update operations were executed sequentially, record by record within the same partition. That leads to CPU underutilization.</p><p>Venice introduced parallel processing so multiple records can be handled concurrently within the same partition <em>before</em> producing them to the version topic, while still preserving strict ordering in the final step.</p><p>Result: significantly improved write throughput for these complex record types.</p><div><hr></div><h4>Use Case 4: Active/active replication with deterministic write latency</h4><p>Eventually consistent systems still get judged by human expectations. People want their writes to show up and they want it to happen predictably.</p><p>Venice is versioned and can ingest backup, current and future versions concurrently in a single server instance. In practice though, only the current version serves reads so deterministic write latency guarantees focus mostly there.</p><p>To improve determinism, Venice introduced a pooling strategy in ingestion with <strong>different priorities</strong> for different workload types. The Venice consumer phase is the first phase in the server ingestion pipeline and controlling the polling rate via pools is how prioritization happens.</p><p>Broad priority tiers:</p><ul><li><p>top priority: active/active and partial update workloads for the <strong>current version on the leader replica</strong> (CPU-intensive and latency-sensitive)</p></li><li><p>next: other workload types targeting the current version</p></li><li><p>then: active/active or partial update workloads for backup or future versions on the leader replica</p></li><li><p>finally: everything else in a lower-priority bucket</p></li></ul><p>This design is trying to do a few practical things:</p><ul><li><p>isolate CPU-heavy workloads so they don&#8217;t slow down lighter ones</p></li><li><p>prioritize the current version so the most up-to-date data flows smoothly</p></li><li><p>keep the number of pools limited to avoid resource management turning into a second job</p></li></ul><p>The catch is tuning. Clusters see different workloads, store behavior varies widely even within one cluster, throughput swings over time and read traffic changes throughout the day. Static configs force you to tune for worst-case, which wastes resources most of the time.</p><p>So Venice introduced adaptive throttling: dynamically adjust ingestion based on recent performance.</p><ul><li><p>if the system is within agreed SLAs, ingestion rates are adjusted according to priorities</p></li><li><p>if an SLA is violated, ingestion is throttled back immediately</p></li></ul><p>Defining the SLAs matters. Venice focuses on two key criteria:</p><ol><li><p><strong>Read latency SLA</strong>: highest priority. Never violate read latency SLAs, even if it costs ingestion throughput</p></li><li><p><strong>Write latency SLA for the current version</strong>: while read latency SLAs are met, write latency for the current version becomes top priority, pools are tuned proportionally to maximize utilization and throughput</p></li></ol><div><hr></div><h4><strong>Wrapping up</strong></h4><p>With these optimizations, Venice at LinkedIn handles:</p><ul><li><p>Over <strong>175 million key lookups per second</strong></p></li><li><p>Over <strong>230 million writes per second</strong></p></li><li><p>While maintaining a <strong>write latency SLA under 10 minutes</strong></p></li></ul><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://www.linkedin.com/blog/engineering/infrastructure/evolution-of-the-venice-ingestion-pipeline">LinkedIn's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-linkedin-built-a-pipeline-that-scales-to-230-million-records?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;7dd74b6f-84de-4b87-a0cf-3e440ec7dc65&quot;,&quot;caption&quot;:&quot;Grab needed to detect schema and value issues in Kafka streams while data was still in motion.<br /><br />This piece breaks down how they introduced real-time checks and fast alerts to catch poison events before they spread.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Grab Detects Data Issues across 100+ Kafka Topics Before They Spread&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-15T04:15:57.055Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1624957083543-9a67140fabfd?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:183755897,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2b5e61e3-2de5-4088-981d-80de61411bd4&quot;,&quot;caption&quot;:&quot;Uber rebuilt its data lake ingestion to move freshness from hours to minutes.<br /><br />This piece breaks down how they replaced batch Spark jobs with Flink streaming, cut compute by 25% and dealt with the very real problems that show up at petabyte scale.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Uber Cut Data Lake Freshness From Hours to Minutes With Flink&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-02T04:30:31.300Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:182833470,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:17,&quot;comment_count&quot;:1,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[What the Data Crowd Was Reading in January 2026]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in January 2026.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-january-2026</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-january-2026</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 05 Feb 2026 03:20:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c8f4e23e-4e9e-4420-bbed-d16a4d242c7d_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>It&#8217;s time for another round-up on all things data and AI!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hhar!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hhar!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!hhar!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!hhar!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!hhar!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hhar!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hhar!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!hhar!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!hhar!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!hhar!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5fadc-b0e9-40b1-88c7-13007ac8b1e4_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for January!</p><div><hr></div><h3>Data science &amp; AI</h3><ul><li><p><strong><a href="https://rlancemartin.github.io/2026/01/09/agent_design/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Agent design patterns</a> (8 minute read)<br></strong>Anthropic engineer provides a grounded guide to designing AI agents that separates real, reliable architectures from overcomplicated agent hype that doesn&#8217;t survive production.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bCMy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bCMy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 424w, https://substackcdn.com/image/fetch/$s_!bCMy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 848w, https://substackcdn.com/image/fetch/$s_!bCMy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 1272w, https://substackcdn.com/image/fetch/$s_!bCMy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bCMy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png" width="1456" height="662" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:662,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121158,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bCMy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 424w, https://substackcdn.com/image/fetch/$s_!bCMy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 848w, https://substackcdn.com/image/fetch/$s_!bCMy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 1272w, https://substackcdn.com/image/fetch/$s_!bCMy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34bed63e-01f2-4740-9291-19f48056bb6f_1974x898.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://timdettmers.com/2026/01/13/use-agents-or-be-left-behind?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Use Agents or Be Left Behind? A Personal Guide to Automating Your Own Work</a> (31 minute read)</strong><br>Tim Dettmers cuts through the agent hype, arguing the real value isn&#8217;t autonomous magic but practical agents that reliably coordinate tools, memory and execution.</p></li><li><p><strong><a href="https://theforecaster.substack.com/p/piecewise-regression-for-time-series">Piecewise Regression for Time Series Forecasting</a> (7 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Rami Krispin&quot;,&quot;id&quot;:116325603,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17d6b557-4338-48c7-ba47-12899dddc77e_3541x3541.jpeg&quot;,&quot;uuid&quot;:&quot;b8501dc7-40ac-4b98-9a94-7f007562707c&quot;}" data-component-name="MentionToDOM"></span> shares a practical walkthrough of using piecewise regression on time series to detect structural breaks, regime changes and trend shifts that single global models tend to smooth over.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jl_j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jl_j!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 424w, https://substackcdn.com/image/fetch/$s_!jl_j!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 848w, https://substackcdn.com/image/fetch/$s_!jl_j!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 1272w, https://substackcdn.com/image/fetch/$s_!jl_j!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jl_j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:45984,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jl_j!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 424w, https://substackcdn.com/image/fetch/$s_!jl_j!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 848w, https://substackcdn.com/image/fetch/$s_!jl_j!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 1272w, https://substackcdn.com/image/fetch/$s_!jl_j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc18836e-b7b1-41cf-8d83-67fea83ac853_1456x819.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://www.artificialintelligencemadesimple.com/p/ai-is-hitting-a-measurement-wall">AI is Hitting a Measurement Wall</a> (27 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Devansh&quot;,&quot;id&quot;:8101724,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48081c70-8afa-41e3-a44e-b0f917bc7577_1200x1600.jpeg&quot;,&quot;uuid&quot;:&quot;d6db2ead-6480-40db-843a-8367916eb34a&quot;}" data-component-name="MentionToDOM"></span> Makes the case that today&#8217;s AI benchmarks are saturated and misleading, masking the growing gap between model performance on tests and value in real applications.</p></li><li><p><strong><a href="https://towardsdatascience.com/drift-detection-in-robust-machine-learning-systems/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Drift Detection in Robust Machine Learning Systems</a> (18 minute read)</strong><br>The article shows how unnoticed drift can quietly degrade model performance and outlines practical techniques to detect it early in production.</p></li><li><p><strong><a href="https://www.interconnects.ai/p/8-plots-that-explain-the-state-of">8 plots that explain the state of open models</a> (7 minute read)<br></strong>Eight charts by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Interconnects AI&quot;,&quot;id&quot;:48206,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/robotic&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c52e8097-8f3d-4f7e-808b-2f4ad37f3b52_720x720.png&quot;,&quot;uuid&quot;:&quot;73df8696-4f8a-456d-9784-a4b832044ed9&quot;}" data-component-name="MentionToDOM"></span> cut through the noise to show that Chinese open models, led by Qwen, dominate real-world adoption and benchmarks, while Western challengers only compete at the very top end.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PO2B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PO2B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 424w, https://substackcdn.com/image/fetch/$s_!PO2B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 848w, https://substackcdn.com/image/fetch/$s_!PO2B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 1272w, https://substackcdn.com/image/fetch/$s_!PO2B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PO2B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp" width="1456" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:49346,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PO2B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 424w, https://substackcdn.com/image/fetch/$s_!PO2B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 848w, https://substackcdn.com/image/fetch/$s_!PO2B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 1272w, https://substackcdn.com/image/fetch/$s_!PO2B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe505afa7-ea5f-46a4-bc96-50a0b71781b0_1456x1200.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://aashidutt.substack.com/p/llms-as-judges-measuring-bias-hinting">LLMs as Judges: Measuring Bias, Hinting Effects, and Tier Preferences</a> (10 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Aashi&quot;,&quot;id&quot;:167292575,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d539466-2556-451d-b783-13fb60c28bf0_144x144.png&quot;,&quot;uuid&quot;:&quot;f79f296c-3d8e-4e07-b2fa-9ff3df6a1323&quot;}" data-component-name="MentionToDOM"></span> and <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Sayak&quot;,&quot;id&quot;:5753925,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ea2e9c7-f95e-4da6-9971-8e75384902d1_500x500.jpeg&quot;,&quot;uuid&quot;:&quot;e48311f2-82a9-4111-b629-c0269ca0d15f&quot;}" data-component-name="MentionToDOM"></span> examine when LLMs can act as evaluators, showing how bias, prompt framing and hinting can distort model-as-judge benchmarks.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-to-build-a-recommendation-system-at-scale">How to Build a Recommendation System at Scale: Insights from Instacart</a> (10 minute read)<br></strong>A practical walk-through of how large-scale recommendation systems are actually built in production by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Ahsaas Bajaj&quot;,&quot;id&quot;:175610076,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/34dac958-9c70-4f48-89ed-6c2e0d6f197e_899x901.png&quot;,&quot;uuid&quot;:&quot;2d33319e-2798-4159-9843-f21a33367409&quot;}" data-component-name="MentionToDOM"></span> , covering modeling choices and the tradeoffs that matter once you move past toy examples.</p></li></ul><div><hr></div><h3>Data engineering</h3><ul><li><p><strong><a href="https://vutr.substack.com/p/i-spent-5-hours-learning-unity-catalog">I spent 5 hours learning Unity Catalog. Here&#8217;s everything you need to know</a> (10 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Vu Trinh&quot;,&quot;id&quot;:167177248,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4805f673-db97-4f7c-85c4-44b345a8de80_256x256.png&quot;,&quot;uuid&quot;:&quot;c36904c1-7997-4c4b-84c1-cc07f367a13b&quot;}" data-component-name="MentionToDOM"></span> provides a breakdown of how Databricks&#8217; open-sourced Unity Catalog works under the hood.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a8cL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a8cL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 424w, https://substackcdn.com/image/fetch/$s_!a8cL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 848w, https://substackcdn.com/image/fetch/$s_!a8cL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 1272w, https://substackcdn.com/image/fetch/$s_!a8cL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a8cL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp" width="1456" height="1040" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1040,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77856,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a8cL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 424w, https://substackcdn.com/image/fetch/$s_!a8cL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 848w, https://substackcdn.com/image/fetch/$s_!a8cL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 1272w, https://substackcdn.com/image/fetch/$s_!a8cL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b165bd5-ff03-43b1-921d-bf0f8d4f1952_1456x1040.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://dataengineeringcentral.substack.com/p/databricks-lakeflow-vs-apache-airflow">Databricks Lakeflow vs Apache Airflow</a> (13 minute read)<br></strong>A candid comparison by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Daniel Beach&quot;,&quot;id&quot;:21715962,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F81caaeec-9053-487c-a59c-ba5f8e4644ad_256x256.jpeg&quot;,&quot;uuid&quot;:&quot;45d7c0b4-a878-48a7-9c93-cf571b32ab66&quot;}" data-component-name="MentionToDOM"></span> showing how Databricks Lakeflow trades Airflow&#8217;s flexibility and openness for tighter platform integration, simpler ops and better defaults if you&#8217;re already all-in on Databricks.</p></li><li><p><strong><a href="https://www.datagibberish.com/p/the-certifications-scam">The Certifications Scam</a> (7 minute read)<br></strong>A blunt takedown of data certifications by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Yordan Ivanov&quot;,&quot;id&quot;:40945395,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ma-p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76f52904-5428-4d97-82a5-3faa722b8d46_2234x1253.jpeg&quot;,&quot;uuid&quot;:&quot;d037c2d2-4a2d-4ab2-b694-509b385c8f66&quot;}" data-component-name="MentionToDOM"></span> , arguing they mostly signal marketing and gatekeeping rather than real skills, experience or on-the-job impact.</p></li><li><p><strong><a href="https://pipeline2insights.substack.com/p/end-to-end-agentic-data-modeling-with-openmetadata-and-mcp">End To End Agentic Data Modeling: Using AI and OpenMetadata MCP for Impact Analysis</a> (8 minute read)</strong><br>A hands-on look at building end-to-end agentic data modeling by combining OpenMetadata with MCP-style agents to automate lineage, context sharing and model evolution across the data stack by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Alejandro Aboy&quot;,&quot;id&quot;:22949723,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!u1Ao!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca2c63d-9f5e-4cd3-99ac-7d8e71dc114b_1024x1024.jpeg&quot;,&quot;uuid&quot;:&quot;814b43cd-dbe6-4127-a76a-edfb4560b7ac&quot;}" data-component-name="MentionToDOM"></span> with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Pipeline to Insights&quot;,&quot;id&quot;:42238863,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd98ddb69-fdec-4599-b3f2-906f7673c8de_408x408.png&quot;,&quot;uuid&quot;:&quot;6d531ff8-6365-44c6-98c7-9e3ba9bcc39f&quot;}" data-component-name="MentionToDOM"></span></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qbBX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qbBX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 424w, https://substackcdn.com/image/fetch/$s_!qbBX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 848w, https://substackcdn.com/image/fetch/$s_!qbBX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 1272w, https://substackcdn.com/image/fetch/$s_!qbBX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qbBX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp" width="1297" height="782" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:782,&quot;width&quot;:1297,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31526,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qbBX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 424w, https://substackcdn.com/image/fetch/$s_!qbBX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 848w, https://substackcdn.com/image/fetch/$s_!qbBX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 1272w, https://substackcdn.com/image/fetch/$s_!qbBX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4605018-38ba-4caa-aefb-707302b61f92_1297x782.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://dlthub.com/blog/building-semantic-models-with-llms-and-dlt?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Autofilling the Boring Semantic Layer: From Sakila to Chat-BI with dltHub</a> (9 minute read)</strong><br>Adrian Brudaru explores how LLMs can help generate and maintain semantic models on top of data pipelines, reducing manual modeling effort while keeping analytics definitions consistent.</p></li><li><p><strong><a href="https://www.ssp.sh/blog/diary-of-a-data-engineer">A Diary of a Data Engineer</a> (13 minute read)<br></strong>A candid, day-in-the-life reflection on what data engineering actually looks like in practice by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Simon Sp&#228;ti&quot;,&quot;id&quot;:27855874,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6fc84efb-1b87-4fb3-bfb1-076664f32de4_2199x2199.jpeg&quot;,&quot;uuid&quot;:&quot;36f92e57-2c7f-4348-b6a2-af9c8037e210&quot;}" data-component-name="MentionToDOM"></span>, highlighting the unglamorous but essential work that keeps data systems running day to day.</p></li><li><p><strong><a href="https://www.brentozar.com/archive/2026/01/database-development-with-ai-in-2026?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Database Development with AI in 2026</a> (11 minute read)</strong><br>Brent Ozar argues that in 2026 AI will meaningfully speed up database development tasks like query writing and troubleshooting but real impact still depends on human judgment and understanding production constraints.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink">How Uber Cut Data Lake Freshness From Hours to Minutes With Flink</a> (11 minute read)<br></strong>Uber rebuilt its data lake ingestion to move freshness from hours to minutes. This piece breaks down how they replaced batch Spark jobs with Flink streaming, cut compute by 25% and dealt with the very real problems that show up at petabyte scale.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KkKT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KkKT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 424w, https://substackcdn.com/image/fetch/$s_!KkKT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 848w, https://substackcdn.com/image/fetch/$s_!KkKT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 1272w, https://substackcdn.com/image/fetch/$s_!KkKT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KkKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp" width="768" height="349" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:349,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15044,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KkKT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 424w, https://substackcdn.com/image/fetch/$s_!KkKT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 848w, https://substackcdn.com/image/fetch/$s_!KkKT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 1272w, https://substackcdn.com/image/fetch/$s_!KkKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cfb36f7-e003-4497-9e59-8a3d8b2f1d7f_768x349.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>Data analysis and visualisation</h3><ul><li><p><strong><a href="https://flowingdata.com/2025/12/31/best-data-visualization-2025?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Best Data Visualization Projects of 2025</a> (3 minute read)</strong><br>FlowingData shares the best data visualisations of 2025</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E-Ir!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E-Ir!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 424w, https://substackcdn.com/image/fetch/$s_!E-Ir!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 848w, https://substackcdn.com/image/fetch/$s_!E-Ir!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 1272w, https://substackcdn.com/image/fetch/$s_!E-Ir!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E-Ir!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png" width="750" height="579" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:579,&quot;width&quot;:750,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32463,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E-Ir!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 424w, https://substackcdn.com/image/fetch/$s_!E-Ir!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 848w, https://substackcdn.com/image/fetch/$s_!E-Ir!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 1272w, https://substackcdn.com/image/fetch/$s_!E-Ir!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9c9021a-0859-4415-a8ce-f7f8f507a015_750x579.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://joseparreogarcia.substack.com/p/storytelling-with-data-book-review">The book that finally taught me how to tell stories with data</a> (12 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Jose Parre&#241;o Garcia&quot;,&quot;id&quot;:255728031,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!h_mv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4dad41-478b-4960-a5e0-98ed1e54657e_1168x1046.jpeg&quot;,&quot;uuid&quot;:&quot;d67f4c91-08f3-46e1-acaa-b45e7b2d73a7&quot;}" data-component-name="MentionToDOM"></span> reviews <em>Storytelling with Data</em>, highlighting that impact comes from framing the message and audience first, not from visualisation tricks.</p></li><li><p><strong><a href="https://nrennie.rbind.io/blog/accessible-line-chart?utm_source=datatinkerer.io&amp;utm_medium=newsletter">How to create a more accessible line chart</a> (10 minute read)<br></strong>Nicola Rennie<strong> </strong>shows how small design choices in line charts (color, contrast, labeling and annotations) dramatically improve accessibility without sacrificing clarity or insight.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TWIW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TWIW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 424w, https://substackcdn.com/image/fetch/$s_!TWIW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 848w, https://substackcdn.com/image/fetch/$s_!TWIW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 1272w, https://substackcdn.com/image/fetch/$s_!TWIW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TWIW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png" width="1344" height="1008" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1008,&quot;width&quot;:1344,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:74549,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TWIW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 424w, https://substackcdn.com/image/fetch/$s_!TWIW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 848w, https://substackcdn.com/image/fetch/$s_!TWIW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 1272w, https://substackcdn.com/image/fetch/$s_!TWIW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b966f9-6048-4043-b385-ea02f874b6b0_1344x1008.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong><a href="https://nastengraph.substack.com/p/5-rules-for-dashboard-filter-placement">5 Rules for Dashboard Filter Placement</a> (6 minute read)</strong><br><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Anastasiya Kuznetsova&quot;,&quot;id&quot;:99725349,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!2E6h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb9d9c-d4e0-4f30-bc37-73eb9ffe4d53_516x534.png&quot;,&quot;uuid&quot;:&quot;0ee8d645-8d3e-4016-a263-296df3043a06&quot;}" data-component-name="MentionToDOM"></span> breaks down five practical rules for placing dashboard filters so users understand what they&#8217;re controlling without adding cognitive load or breaking trust.</p></li></ul><div><hr></div><h3><strong>Other interesting reads</strong></h3><ul><li><p><strong><a href="https://williaminmon.substack.com/p/ontologies-some-perspectives">ONTOLOGIES - SOME PERSPECTIVES</a> (20 minute read)<br></strong>A great intro and explanation of ontologies by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;William Inmon&quot;,&quot;id&quot;:125217701,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c2feef8-6c8a-42f6-b044-9823cbe10e5d_144x144.png&quot;,&quot;uuid&quot;:&quot;20b045af-deb2-4334-8e3d-bf06afa00bd9&quot;}" data-component-name="MentionToDOM"></span> (Bill Inmon) and <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Jessica Talisman&quot;,&quot;id&quot;:24176542,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!zEsI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18f1fe4e-779e-4a27-be92-71fac460ee01_935x935.jpeg&quot;,&quot;uuid&quot;:&quot;3094c2d1-7a16-4b32-8ce3-f2a9962adbc9&quot;}" data-component-name="MentionToDOM"></span>. Really worth a read if you have heard the term a lot but are not sure what it means and how it can be applied </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zdtA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zdtA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 424w, https://substackcdn.com/image/fetch/$s_!zdtA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 848w, https://substackcdn.com/image/fetch/$s_!zdtA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 1272w, https://substackcdn.com/image/fetch/$s_!zdtA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zdtA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp" width="336" height="262" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:262,&quot;width&quot;:336,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9792,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zdtA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 424w, https://substackcdn.com/image/fetch/$s_!zdtA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 848w, https://substackcdn.com/image/fetch/$s_!zdtA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 1272w, https://substackcdn.com/image/fetch/$s_!zdtA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F785284ed-c831-4e35-baac-6f2092e946ba_336x262.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://www.nicolasbustamante.com/p/lessons-from-building-ai-agents-for?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Lessons from Building AI Agents for Financial Services</a> (23 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Nicolas Bustamante&quot;,&quot;id&quot;:17282676,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/cba30217-51b3-4192-b82a-d4f006dd8ad3_1536x2049.jpeg&quot;,&quot;uuid&quot;:&quot;ccdeb7af-9549-4253-b592-bc84780150ed&quot;}" data-component-name="MentionToDOM"></span> breaks down what building AI agents actually looks like in production, separating real engineering constraints from agent hype.</p></li><li><p><strong><a href="https://epoch.ai/blog/introducing-the-ai-chip-sales-data-explorer?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Introducing the AI Chip Sales Data Explorer</a> (3 minute read)</strong><br>Epoch AI introduces an interactive dataset tracking global AI chip sales, shedding light on who&#8217;s actually buying compute and how hardware demand is shaping the AI race.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H598!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H598!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!H598!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!H598!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!H598!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H598!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:151406,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/186553359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H598!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!H598!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!H598!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!H598!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25e6466-e44a-4728-94a6-454b921092f7_1920x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul><div><hr></div><h3><strong>Quick favor - need your take</strong></h3><div class="poll-embed" data-attrs="{&quot;id&quot;:443083}" data-component-name="PollToDOM"></div><p><strong>Was there any standout article or topic from January I missed? Feel free to drop a comment or hit reply, even a quick line helps.</strong></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;6e0c97bb-8be5-42ce-a02a-36b05fdd232c&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from December&#128071;<br /><br />&#119811;&#119834;&#119853;&#119834; &#119826;&#119836;&#119842;&#119838;&#119847;&#119836;&#119838; &amp;amp; &#119808;&#119816;<br />The state of LLMs in 2025<br />Building a data cleaning agent with LangGraph<br />Making sense of memory in AI agents<br />Exploring TabPFN: a foundation model built for tabular data<br /><br />&#119811;&#119834;&#119853;&#119834; &#119812;&#119847;&#119840;&#119842;&#119847;&#119838;&#119838;&#119851;&#119842;&#119847;&#119840;<br />Opinionated data platforms vs. open-source<br />Data quality design patterns<br />LLM for PDF data pipelines<br />DuckDB: the Swiss army knife for data engineers<br /><br />&#119811;&#119834;&#119853;&#119834; &#119808;&#119847;&#119834;&#119845;&#119858;&#119852;&#119842;&#119852; &amp;amp; &#119809;&#119816;<br />A comprehensive guide to data visualization<br />Broken charts and 9 visualization alternatives<br /><br />Plus: The most useful skill to learn as a data professional, predictions about AI in 2026 and the next data bottleneck&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in December 2025&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-08T05:01:52.132Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29125fa4-9a37-40a2-a85c-c795fb77137f_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-december-2025&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:183495145,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:16,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;738bc346-aa09-4943-a656-9c97ecf88686&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from November&#128071;<br /><br />Data Science &amp;amp; AI<br />Context engineering becomes the real bottleneck for AI agents<br />Classic algorithms still beat most enterprise AI in ROI<br />A practical framework to identify true agentic use cases<br />Gemini 3 benefits from direct structured prompting<br /><br />Data Engineering<br />DuckLake revives relational metadata for lakehouses<br />Event streaming hits market saturation<br />Real-world consulting lessons point to simpler pipelines over hype<br />Dark data hoarding kills AI signal<br /><br />Data Analysis &amp;amp; BI<br />Dashboard testing gets a full end-to-end checklist<br />Guidance on balancing accuracy vs speed when answering business questions.<br /><br />Plus: AI-coded &#8220;good enough&#8221; apps shift the buy-vs-build boundary, low-tech industries become prime AI adopters as margins flip and new benchmark analysis suggests model performance is mostly general capability with a smaller &#8220;Claudiness&#8221; axis on top.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in November 2025&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-12-03T07:52:29.847Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c31550f6-1fdf-4738-b384-2eeb55f71662_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-november-2025&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:180567973,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:13,&quot;comment_count&quot;:3,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How to Build a Recommendation System at Scale: Insights from Instacart]]></title><description><![CDATA[A Senior ML Engineer on production constraints, rules vs ML and the workflow behind large-scale recommender systems]]></description><link>https://www.datatinkerer.io/p/how-to-build-a-recommendation-system-at-scale</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-to-build-a-recommendation-system-at-scale</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 29 Jan 2026 03:30:24 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3e6c5924-ed6c-4998-8e4a-8f88d9102c8b_844x473.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers,</p><p>Following on from previous posts talking to people in the field, today I will be talking with Ahsaas Bajaj who is a Senior Machine Learning Engineer at Instacart. He works on large-scale recommendation systems that serves millions of customers.</p><p>We talked about his rise from software engineering to machine learning at Instacart, how does he decide between rules based vs ML approaches and how he approaches the work now as a more senior stakeholder.</p><p>So without further ado, let&#8217;s get into it!</p><div><hr></div><h4><strong>Can you tell us a bit about your role?</strong></h4><p>I&#8217;m a Senior ML Engineer at Instacart, working across customer and shopper experiences on large-scale recommendation systems that make millions of decisions each day. For the past three years, I&#8217;ve led the technical strategy for the Product Substitutions ML system, focused on solving the out-of-stock problem. </p><p>The goal is simple: when an item isn&#8217;t available, suggest a replacement that preserves customer intent and keeps the order intact. My role spans system design, modeling and evaluation, balancing customer satisfaction, shopper efficiency and business impact at scale.</p><div><hr></div><h4><strong>How did you get into machine learning?</strong></h4><p>My path into ML wasn&#8217;t a straight line. I started as a software engineer at Samsung Research on the on-device search team, which pushed me deep into information retrieval and search system design. That work sparked an interest in research and led me to pursue a graduate degree in computer science. </p><p>It shaped how I approach ML today: less focus on models in isolation, more on how systems behave in production. I wanted that work to have real user impact, which took me to Walmart Labs and eventually to Instacart.</p><div class="pullquote"><p><em><strong>Ahsaas&#8217;s path</strong></em></p><p><em><strong>software engineer &#8594; data scientist &#8594; ML engineer &#8594; senior ML engineer</strong></em></p></div><h4><strong>What does a &#8216;typical&#8217; week look like for you?</strong></h4><p>As I&#8217;ve moved into a more senior role, the balance has shifted from pure coding to a mix of execution and direction. My week usually breaks down into three buckets:</p><p><strong>Alignment (30%)</strong>: The glue work. I spend time with product, backend engineering, and leadership aligning on roadmaps. The focus isn&#8217;t just <em>what</em> we&#8217;re building, but <em>why</em>, making sure ML work ties directly to business goals.</p><p><strong>Deep work (30%)</strong>: Hands-on modeling, coding and system design. Staying close to the code is non-negotiable for me, even at a senior level.</p><p><strong>Analysis and &#8220;the why&#8221; (40%)</strong>: This is where I spend the most time. I dig into model errors, read raw customer complaints about failed substitutions and sanity-check improvement ideas. This is also where I write proposal docs. In my view, the highest-leverage work a senior MLE does is deciding what problems to solve next, not just executing on what&#8217;s assigned.</p><div><hr></div><h4><strong>How do you decide when a problem actually needs ML or if rules-based is good enough?</strong></h4><p>I think about it in terms of complexity versus value.</p><p>If a problem can be solved deterministically with clear rules and those rules are stable and understandable, that&#8217;s often the right solution. Machine learning becomes useful when the space of behaviors is too large, nuanced, or context-dependent for rules to scale.</p><p>Good data is also a prerequisite. Without reliable signals and feedback loops, even the most sophisticated model won&#8217;t perform well in production.</p><div><hr></div><h4><strong>You have written about your work on a recommendation model at Instacart. Can you share a summary of what you have done?</strong></h4><p>I&#8217;ve spent the past three years leading the technical development of Instacart&#8217;s <a href="https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af">Product Substitutions system</a>, which handles millions of replacement decisions daily. The core challenge is deceptively simple: when a customer&#8217;s requested item is out of stock, what should we suggest instead?</p><p>What makes this interesting from an ML perspective is that it&#8217;s fundamentally a relevance problem, not a search problem. We&#8217;re not just matching product attributes&#8212;we&#8217;re trying to understand what the customer actually wanted and find alternatives that preserve that intent. This required rethinking how we model the relationship between items, how we define &#8220;good&#8221; substitutions, and how we evaluate success in a way that maps to real customer satisfaction.</p><p>The system has evolved significantly over time, moving from simpler heuristics to more sophisticated learned representations. But the north star has always been the same: keep orders complete while respecting what customers actually care about.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YXvE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YXvE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 424w, https://substackcdn.com/image/fetch/$s_!YXvE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 848w, https://substackcdn.com/image/fetch/$s_!YXvE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 1272w, https://substackcdn.com/image/fetch/$s_!YXvE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YXvE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp" width="720" height="187" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:187,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8232,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/181648418?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YXvE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 424w, https://substackcdn.com/image/fetch/$s_!YXvE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 848w, https://substackcdn.com/image/fetch/$s_!YXvE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 1272w, https://substackcdn.com/image/fetch/$s_!YXvE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d474a9-46da-42ae-be81-a0ca692fb52f_720x187.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Siamese network (Source: <a href="https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af">Instacart</a>)</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uWDY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uWDY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 424w, https://substackcdn.com/image/fetch/$s_!uWDY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 848w, https://substackcdn.com/image/fetch/$s_!uWDY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 1272w, https://substackcdn.com/image/fetch/$s_!uWDY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uWDY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp" width="720" height="447" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:447,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15228,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/181648418?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uWDY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 424w, https://substackcdn.com/image/fetch/$s_!uWDY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 848w, https://substackcdn.com/image/fetch/$s_!uWDY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 1272w, https://substackcdn.com/image/fetch/$s_!uWDY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe97d103-ab8d-46c0-a6c1-dc95484a86c1_720x447.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Product layer: one each for original and candidate product (Source: <a href="https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af">Instacart</a>)</figcaption></figure></div><div><hr></div><h4><strong>And what has been the impact on the business?</strong></h4><p>Substitutions sit at a critical junction in the order lifecycle. When done well, they&#8217;re invisible - customers get what they need and the order stays intact. When done poorly, they create friction everywhere: customers reject items or request refunds, shoppers waste time on unsuccessful suggestions, and order values drop.</p><p>Our work has meaningfully moved the needle on the metrics that matter: replacement acceptance rates, refund frequency, and what we call &#8220;perfect order fill rate&#8221;&#8212;the percentage of orders where every item was either found or successfully replaced. These improvements compound across millions of weekly orders.</p><p>Beyond the immediate transactional metrics, we&#8217;ve also seen positive signals in repeat ordering behavior and customer satisfaction scores, particularly for orders that required multiple substitutions. Instacart has <a href="https://investors.instacart.com/static-files/27fac1c6-da32-40ca-8ef4-c8261b5ee12b">referenced</a> this system publicly when discussing operational improvements at scale.</p><p>For me, the real validation is when customers don&#8217;t notice the algorithm at all - they just notice their groceries arrived complete.</p><div><hr></div><h4><strong>What does the tech stack look like for ML at Instacart?</strong></h4><p>Instacart&#8217;s ML stack is built around an internal platform called <a href="https://www.instacart.com/company/tech-innovation/griffin-how-instacarts-ml-platform-tripled-ml-applications-in-a-year">Griffin</a>, which standardizes the end-to-end ML lifecycle, from feature engineering and training to deployment and real-time inference. A core piece of this is a shared Feature Marketplace, where teams define, version and reuse batch and streaming features with strong offline-to-online consistency.</p><p>Workflows are orchestrated with Apache Airflow and model training runs through a unified abstraction that supports multiple compute backends and common ML frameworks. With <a href="https://tech.instacart.com/introducing-griffin-2-0-instacarts-next-gen-ml-platform-b7331e73b8d7">Griffin 2.0</a>, the platform moved to a Kubernetes-based setup and added distributed training with Ray, which significantly improved scalability and iteration speed.</p><p>Griffin also includes a centralized model registry and metadata store, making experiments easier to track and reproduce. In production, models are deployed as standardized services that handle feature loading and low-latency inference across both customer and shopper experiences.</p><p>The main benefit is focus: teams spend less time on infrastructure and more time on modeling, evaluation and trade-offs.</p><div><hr></div><h4><strong>How do you use AI in your day-to-day work and where do you find it genuinely valuable?</strong></h4><p>I&#8217;ve integrated GenAI primarily to shift my focus from execution to decision-making. It&#8217;s useful for routine tasks like scaffolding data pipelines or optimizing SQL queries, but I find the highest leverage comes from <strong>qualitative analysis</strong>.</p><p>I routinely feed thousands of customer comments and shopper notes about bad substitutions into LLM-driven pipelines that cluster feedback into coherent themes. What used to be unstructured noise becomes a prioritized list of failure modes. This allows me to spend less time parsing data and more time solving the specific problems that actually impact customer trust.</p><div><hr></div><h4><strong>How has your perspective changed moving to a more senior role? </strong></h4><p>The biggest shift is realizing that <strong>Judgment &gt; Code</strong>. Early in my career, I obsessed over the <em>how</em> - the architecture, the libraries, the latency. Now, I obsess over the <em>what </em>and the<em> why.</em> The real work is filtering ideas. In a sea of seemingly good ideas, my job is to find the <em>most bullish</em> one - the one with the highest ROI - and kill the others.</p><p>I&#8217;ve also learned that <strong>Writing is Engineering.</strong> You cannot build big things alone. To get buy-in from leadership and cross-functional teams, you must be able to write crisp, narrative-driven proposals that explain <em>why</em> this mathematical solution solves a human problem.</p><div class="pullquote"><p><strong>The biggest shift is realizing that Judgment &gt; Code</strong></p></div><h4><strong>What&#8217;s one thing you wish you&#8217;d known earlier about machine learning?</strong></h4><p>The value of <strong>error analysis</strong>. It&#8217;s easy to celebrate aggregate metrics like accuracy or F1 but the real breakthroughs come from studying the &#8220;horror cases,&#8221; where the model is confidently wrong. Those examples are uncomfortable to look at but they&#8217;re where the most useful ideas come from. You can&#8217;t fix what you don&#8217;t deeply understand.</p><div><hr></div><p>If you enjoyed reading this, check out Ahsaas&#8217;s <a href="https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af">original article</a> about his work at Instacart</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QfuD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 424w, https://substackcdn.com/image/fetch/$s_!QfuD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 848w, https://substackcdn.com/image/fetch/$s_!QfuD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 1272w, https://substackcdn.com/image/fetch/$s_!QfuD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QfuD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png" width="692" height="394" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:394,&quot;width&quot;:692,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:197952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://tech.instacart.com/how-instacart-uses-machine-learning-to-suggest-replacements-for-out-of-stock-products-8f80d03bb5af&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/181648418?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QfuD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 424w, https://substackcdn.com/image/fetch/$s_!QfuD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 848w, https://substackcdn.com/image/fetch/$s_!QfuD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 1272w, https://substackcdn.com/image/fetch/$s_!QfuD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F276ea8ad-5379-4559-8bae-2cb8d384a294_692x394.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>Was there a question that you would like to ask?</p><p><strong>Let me know your thoughts by replying to the email or leaving a comment below!</strong></p><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/from-dental-cleaning-to-data-cleaning-pivoting-to-healthcare-analytics?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE3NjQ3MzIyMywiaWF0IjoxNzY4ODkyNzM0LCJleHAiOjE3NzE0ODQ3MzQsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.vxPR9Jc4G7L4Yjw3wvlaaj8dKYSscG1A_D7Wiblqr1o&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/from-dental-cleaning-to-data-cleaning-pivoting-to-healthcare-analytics?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE3NjQ3MzIyMywiaWF0IjoxNzY4ODkyNzM0LCJleHAiOjE3NzE0ODQ3MzQsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.vxPR9Jc4G7L4Yjw3wvlaaj8dKYSscG1A_D7Wiblqr1o"><span>Share</span></a></p><div><hr></div><h3><strong>Keep reading</strong></h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2d3312e0-ea8d-4705-959d-5748abc99f31&quot;,&quot;caption&quot;:&quot;Today I will be talking with Jose Parre&#241;o Garcia who is a Senior Data Science Manager at Skyscanner and writer of the Senior Data Science Lead newsletter.<br /><br />We talked about his rise from data analyst to Senior DS Manager at Skyscanner, what &#8220;production-ready&#8221; really means and why the real intelligence in data science lives before and after the model.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;From Data Analyst to Senior DS Manager at Skyscanner&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:255728031,&quot;name&quot;:&quot;Jose Parre&#241;o Garcia&quot;,&quot;bio&quot;:&quot;I write about Data Science, Machine Learning and leading data teams. I have built teams from scratch and lead 50+ data scientists @Skyscanner. Now, I share my experience with you.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!h_mv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4dad41-478b-4960-a5e0-98ed1e54657e_1168x1046.jpeg&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://joseparreogarcia.substack.com/subscribe?&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://joseparreogarcia.substack.com&quot;,&quot;primaryPublicationName&quot;:&quot;Senior Data Science Lead&quot;,&quot;primaryPublicationId&quot;:2833541}],&quot;post_date&quot;:&quot;2025-11-13T03:54:26.969Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06735d58-e8f2-4106-88ae-efe0658c217c_764x661.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/from-data-analyst-to-senior-ds-manager-at-skyscanner&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:176541975,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:13,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;c93130ad-2195-48c8-b16d-9ee951675f0b&quot;,&quot;caption&quot;:&quot;Check out the breakdown of Ahsaas's original article which we published last year!&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The Art of Substitution: Instacart&#8217;s ML Model for Better Shopping Choices&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-01-12T23:01:15.656Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!9h_o!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00a24c5b-b7a6-4d5c-9bc2-0e7b691d7d75_4800x2700.webp&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/the-art-of-substitution-instacarts-ml-model&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:154057578,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How DoorDash Saves Tens of Millions of Dollars Per Year by Detecting Fraud 30× Faster]]></title><description><![CDATA[A daily anomaly detection system that cut discovery time from 100+ days to under three.]]></description><link>https://www.datatinkerer.io/p/how-doordash-saves-tens-of-millions-a-year-by-detecting-fraud</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-doordash-saves-tens-of-millions-a-year-by-detecting-fraud</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Fri, 23 Jan 2026 05:56:24 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how DoorDash uses anomaly detection to save millions of dollars by flagging fraud trends early. </p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Roe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Roe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Roe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/175671629?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Roe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on)  provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to DoorDash&#8217;s fraud detection!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="5320" height="3377" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3377,&quot;width&quot;:5320,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a close up of a cell phone on a table&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a close up of a cell phone on a table" title="a close up of a cell phone on a table" srcset="https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1648091855444-76f97897dcd4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxkb29yZGFzaHxlbnwwfHx8fDE3NjkxNDc0MjB8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@querysprout">Marques Thomas</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Fraud trends at DoorDash often blended into normal delivery noise and went unnoticed for weeks, causing avoidable losses. Existing detection was reactive and too slow.</p><h4><strong>Task</strong></h4><p>Detect emerging fraud trends early across millions of users and segments, before they materially impact top-line metrics.</p><h4><strong>Action</strong></h4><p>Build a daily anomaly detection platform that segments key fraud metrics across millions of overlapping dimensions, applies time-series z-score detection, clusters related anomalies and routes them into an ops investigation workflow.</p><h4><strong>Result</strong></h4><p>Cut average fraud detection time from 100+ days to under 3 days, surfaced 60%+ of new fraud trends early, and saved tens of millions annually.</p><h4><strong>Use Cases</strong></h4><p>Anomaly detection, fraud detection, payment monitoring, policy change impact monitoring</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Airflow, DuckDB, Apache Spark, Python</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Fraud trend detection before it becomes a headline</h4><p>Fraud doesn&#8217;t always kick the door down. Sometimes it slips in through the side window and blends into the noise of millions of legitimate deliveries.</p><p>A small spike in refund claims. A pattern in high-risk charges linked to a specific bank. A subtle shift in behavior that looks like randomness until it isn&#8217;t. Left alone, those early signals can snowball into a large trend with real top-line impact.</p><p>DoorDash&#8217;s fraud team wanted to flip the script. Instead of reacting after a new fraud trend has had weeks to grow unchecked, how could they spot it as early as possible, before significant damage is done?</p><p>This post shares how the DoorDash team built an anomaly detection platform that scans for emerging patterns across millions of user segments and surfaces the ones that matter before they spiral into major losses.</p><div><hr></div><h4>Terminology</h4><p>&#8216;Anomaly detection&#8217; is a broad term. Even within fraud, people can mean very different things by it. For this system, DoorDash defined two categories up front:</p><p><strong>Anomalous trend detection</strong></p><p>Looking for anomalous behavior in a <em>collection</em> of users that may represent a new fraud or false-positive trend.</p><p>Here, no single datapoint needs to be weird. The anomaly is the time-series pattern that emerges from many points together, like a growing fraud segment over time.</p><p><strong>Anomalous outlier detection</strong></p><p>Looking for <em>individual</em> outliers, like a specific user or transaction that is rare or deviates sharply from normal behavior.</p><p>In this case, the datapoint is the anomaly. It might be part of a broader trend, or it might be a one-off.</p><p>This post focuses how DoorDash built a system to detect <strong>anomalous trends</strong>.</p><p>Here are some terms used within the article and their definitions and examples to make them easier to understand.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a0Fu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a0Fu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 424w, https://substackcdn.com/image/fetch/$s_!a0Fu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 848w, https://substackcdn.com/image/fetch/$s_!a0Fu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 1272w, https://substackcdn.com/image/fetch/$s_!a0Fu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a0Fu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png" width="1456" height="507" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:507,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:132407,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/185495640?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a0Fu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 424w, https://substackcdn.com/image/fetch/$s_!a0Fu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 848w, https://substackcdn.com/image/fetch/$s_!a0Fu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 1272w, https://substackcdn.com/image/fetch/$s_!a0Fu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98747126-080a-4ec0-954e-03e2ce6fdeb8_2173x757.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Terminology table (Source: Doordash)</figcaption></figure></div><div><hr></div><h4><strong>Designing the system around real fraud failures</strong></h4><p>The DoorDash team started the way you&#8217;d hope a fraud platform starts: by talking to the people who have to use it.</p><p>They met with frontline fraud teams responsible for tracking and fighting new fraud trends and asked for concrete historical examples of trends that simmered longer than ideal before being discovered and mitigated. These became the positive test cases.</p><p>Next, the teams were asked for:</p><ul><li><p>Their most useful early-warning indicator <strong>metrics</strong></p></li><li><p>The <strong>dimensions</strong> they commonly use to slice data when investigating a new fraud trend</p></li></ul><p>That produced a working set of:</p><ul><li><p>Positive examples (historical missed or late-found fraud trends)</p></li><li><p>A set of metrics that act as early-warning signals</p></li><li><p>A set of dimensions that represent how investigators naturally segment the world</p></li></ul><p>Then the DoorDash team built the system and backtested it. Tuning came next, but the tuning goal was very specific:</p><p>1- Maintain 100% recall on the test trends<br>2- Minimise the number of non-fraudulent anomalies per day</p><p>One observation stood out from this phase. The system was fairly insensitive to exact tuning values. What mattered more was upstream: choosing thoughtful metrics and dimensions that can actually capture fraud trends in the first place.</p><p>In other words: the math is important but the slices you choose decide what you can even see.</p><div><hr></div><h4>Architecture overview</h4><p>The anomaly detection platform runs as a daily job coordinated by Airflow. It looks for fraud trends growing on a day-to-week timescale.</p><p>DoorDash currently runs anomaly detection jobs for both <strong>consumer fraud</strong> and <strong>Dasher fraud</strong>, with plans to expand to more applications over time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Osej!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Osej!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 424w, https://substackcdn.com/image/fetch/$s_!Osej!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 848w, https://substackcdn.com/image/fetch/$s_!Osej!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 1272w, https://substackcdn.com/image/fetch/$s_!Osej!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Osej!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp" width="1024" height="268" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:268,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:43376,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/185495640?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Osej!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 424w, https://substackcdn.com/image/fetch/$s_!Osej!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 848w, https://substackcdn.com/image/fetch/$s_!Osej!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 1272w, https://substackcdn.com/image/fetch/$s_!Osej!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b856a6b-f238-472b-a3a3-8ab309843cf8_1024x268.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Doordash anomaly detection platform (Source: Doordash)</figcaption></figure></div><p>The platform has five steps:</p><ol><li><p>Preparing daily fraud snapshots</p></li><li><p>Metric aggregation on multi-dimensional segments</p></li><li><p>Time-series anomaly detection</p></li><li><p>Hierarchical clustering on anomalous segments</p></li><li><p>Turning clusters into investigations and containment</p></li></ol><div><hr></div><h4>Step 1: Preparing daily fraud snapshots</h4><p>DoorDash chose a daily batch job for the initial implementation because the fraud trends they historically missed developed over <strong>a few days to a few weeks</strong>.</p><p>An Airflow DAG prepares a dataset for each anomaly detection job containing the day&#8217;s data snapshot in a wide-table format.</p><p>If the trends you historically missed unfold across days and weeks, you do not need sub-second streaming to get meaningful wins. You need consistency, coverage and a reliable cadence.</p><div><hr></div><h4>Step 2: Metric aggregation on multi-dimensional segments</h4><p>This is the scale step. Once the daily snapshot is ready, DoorDash loads the single date&#8217;s data into a Python environment via Spark, then computes metric aggregates across segments.</p><p>For each metric, they track both:</p><ul><li><p><strong>Absolute value</strong> of the metric</p><ul><li><p>Example: dollar value of credit and refund claims</p></li></ul></li><li><p><strong>Relative (normalized) value</strong> of the metric</p><ul><li><p>Example: credit and refund claims divided by dollar value of orders</p></li></ul></li></ul><p>Why both? because absolute values catch &#8216;this is costing real money&#8217; and relative values catch &#8216;this is spiking compared to what is normal for this slice&#8217;.</p><p>Then comes segmentation. Segments are formed from single, double and triple product combinations of all dimensions. That quickly becomes huge and can run into 100s of millions of segments at Doordash scale and compute becomes important</p><p><strong>DuckDB for aggregation</strong></p><p>DoorDash computes metric aggregates using DuckDB, an in-memory Python database optimised for fast OLAP-style operations.</p><p>They chose DuckDB because it was:</p><ul><li><p>Much faster (less than 10 minutes)</p></li><li><p>More memory efficient than Pandas</p></li></ul><p>The system also excludes dimensional products with cardinality greater than 10^7 to reduce the total number of segments to a manageable size.</p><p>Finally, storage format.</p><p>The day&#8217;s metrics aggregated across hundreds of millions of segments are stored in the data warehouse in <strong>sparse tall table format</strong>.</p><p>In plain English: if a segment has a metric value of zero, DoorDash drops it. That cuts storage and keeps both DuckDB and the downstream warehouse from filling up with rows that say &#8216;nothing happened here.&#8217;</p><div><hr></div><h4>Step 3: Time-series anomaly detection</h4><p>After Step 2, DoorDash has daily metric aggregates by segment. They keep the previous 28 days of data in the data warehouse, so the platform now has several hundred million metric time series, each of length 28.</p><p>DoorDash chose a simple <strong>moving-window z-score</strong> approach, because it performed well in testing and detected all historical fraudulent trends they used as positive examples.</p><p><strong>Baseline and test setup</strong></p><ul><li><p>First <strong>21 days</strong> form the baseline</p></li><li><p>The <strong>28th day</strong> is the test day</p></li><li><p>There is a <strong>7-day gap</strong> between the baseline and the test day</p></li></ul><p>That gap exists for a very specific reason. The team noticed many historical fraud trends had a noisy phase when they first started scaling. By leaving a gap, the baseline variance better reflects &#8216;normal before the trend&#8217; which reduces missed trends.</p><p><strong>What counts as an anomaly</strong></p><p>A segment&#8217;s time series is flagged as anomalous if it meets both:</p><ol><li><p><strong>Statistical significance: </strong>The 28th-day <em>relative</em> metric is greater than X standard deviations above the mean of the 21-day baseline. DoorDash found <strong>6 standard deviations</strong> worked well empirically.</p></li><li><p><strong>Business significance: </strong>The 28th-day <em>absolute</em> metric exceeds the 21-day baseline by a dollar value and/or count that is meaningful for that metric. Thresholds vary by metric and were chosen with operations partners.</p></li></ol><p>That two-part rule matters. Statistical significance alone finds weirdness. Business significance filters it down to weirdness that&#8217;s worth a human&#8217;s time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-sSP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-sSP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 424w, https://substackcdn.com/image/fetch/$s_!-sSP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 848w, https://substackcdn.com/image/fetch/$s_!-sSP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 1272w, https://substackcdn.com/image/fetch/$s_!-sSP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-sSP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp" width="1024" height="595" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:595,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60360,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/185495640?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-sSP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 424w, https://substackcdn.com/image/fetch/$s_!-sSP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 848w, https://substackcdn.com/image/fetch/$s_!-sSP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 1272w, https://substackcdn.com/image/fetch/$s_!-sSP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30486ff1-68ef-4ae3-bc16-a96093efa794_1024x595.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Anomaly calculation example (Source: Doordash)</figcaption></figure></div><div><hr></div><h4><strong>Step 4: Hierarchical clustering on anomalous segments</strong></h4><p>Real fraud trends rarely show up as a single clean segment anomaly. A single trend often triggers anomalous increases across many partially overlapping segments. Example:</p><p>A spike in credit and refund claims at &#8216;Retailer One&#8217; could cause anomalies in segments like:</p><ul><li><p><code>{business_name='Retailer One'}</code></p></li><li><p><code>{country='US', business_name='Retailer One'}</code></p></li><li><p><code>{business_vertical='retail', business_name='Retailer One'}</code></p></li></ul><p>So Step 4 exists to shrink &#8216;thousands of anomalies&#8217; into &#8216;a few dozen things to look at&#8217;.</p><p><strong>Segment graph structure</strong></p><p>Dimensional segments have a natural structure that can be represented as a three-layer graph:</p><ul><li><p><strong>Top layer:</strong> singlets</p><ul><li><p><code>{business_name='Retailer One'}</code></p></li></ul></li><li><p><strong>Middle layer:</strong> pairs</p><ul><li><p><code>{business_name='Retailer One', country='US'}</code></p></li></ul></li><li><p><strong>Bottom layer:</strong> triplets</p><ul><li><p><code>{business_name='Retailer One', country='US', checkout_platform='iOS'}</code></p></li></ul></li></ul><p>DoorDash further partitions the graph by <code>METRIC_NAME</code> so clustering happens within a metric type.</p><p><strong>Clustering rules</strong></p><p>To connect anomalies within the same metric type:</p><ol><li><p><strong>Connect parent anomalies with child anomalies</strong></p><ul><li><p><code>{business_name='Retailer One'}</code> is parent of <code>{country='US', business_name='Retailer One'}</code></p></li><li><p><code>{country='US', business_name='Retailer One'}</code> is parent of <code>{business_name='Retailer One', country='US', checkout_platform='iOS'}</code></p></li></ul></li><li><p><strong>Connect sibling anomaly triplets</strong> if they share <strong>2/3</strong> of their keys and values</p><ul><li><p><code>{business_name='Retailer One', country='US', checkout_platform='iOS'}</code><br>connects with<br><code>{business_name='Retailer One', country='US', business_vertical='retail'}</code></p></li></ul></li></ol><p>Then DoorDash runs a graph partition algorithm to find connected anomaly clusters.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FXWL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FXWL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 424w, https://substackcdn.com/image/fetch/$s_!FXWL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 848w, https://substackcdn.com/image/fetch/$s_!FXWL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 1272w, https://substackcdn.com/image/fetch/$s_!FXWL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FXWL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp" width="912" height="317" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f2331571-5361-4687-9c9e-661654117e83_912x317.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:317,&quot;width&quot;:912,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23418,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/185495640?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FXWL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 424w, https://substackcdn.com/image/fetch/$s_!FXWL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 848w, https://substackcdn.com/image/fetch/$s_!FXWL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 1272w, https://substackcdn.com/image/fetch/$s_!FXWL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2331571-5361-4687-9c9e-661654117e83_912x317.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Red circles indicate anomalous segments, while grey circles indicate non-anomalous segments. (Source: Doordash)</figcaption></figure></div><p><strong>Picking a representative segment</strong></p><p>Ops teams review a cluster starting from a single representative segment chosen using a fitness function:</p><pre><code><code>fitness = abs_anom_amt * rel_amt / level^1.2
</code></code></pre><p>Where:</p><ul><li><p><code>abs_anom_amt</code> = 28th-day metric minus the 21-day baseline</p></li><li><p><code>rel_amt</code> = relative (normalized) 28th-day metric within the segment</p></li><li><p><code>level</code> = 0 for singlets, 1 for pairs, 2 for triplets</p></li></ul><p>The intuition:</p><ul><li><p><code>abs_anom_amt</code> behaves a bit like &#8216;how much impact&#8217; (think recall)</p></li><li><p><code>rel_amt</code> behaves a bit like &#8216;how concentrated&#8217; (think precision)</p></li><li><p>dividing by a weak function of <code>level</code> biases toward simpler segments</p></li></ul><p>So the representative is usually a segment that is impactful, unusually high relative to its baseline and not needlessly specific.</p><p><strong>What volume looks like in practice</strong></p><p>In real operation, DoorDash typically sees anomalies in several thousand segments per day. Clustering reduces that to <strong>20 to 60 anomalous clusters per day</strong> across consumer and Dasher fraud areas, which is a volume the operations team can realistically investigate.</p><div><hr></div><h4><strong>Step 5: Turning clusters into investigations and containment</strong></h4><p>Detection is not the finish line, it is just the trigger.</p><p>The representative anomalous segments, along with all other segments in the cluster and example events (deliveries and Dasher assignments), are accessible in a workflow tool for ops investigation.</p><p>Ops agents review example deliveries or assignments within the representative segment, looking for trends or patterns that may represent a new fraud trend.</p><p>Sometimes the pattern is non-fraudulent, like a new promotion causing a spike in refunds. Other times it is fraudulent.</p><p>When a trend is deemed fraudulent:</p><ul><li><p>it is root-caused in partnership with engineering and product teams so the root cause can be addressed</p></li><li><p>a separate containment team runs queries to identify and stop fraudsters matching the trend pattern until product fixes land</p></li></ul><p>So the system is not just detection. It&#8217;s detection wired into investigation, containment and longer-term remediation.</p><div><hr></div><h4>Results</h4><p>DoorDash now uses the anomaly detection platform as its primary early-warning source for new fraudulent trends.</p><p>Key results reported by the team:</p><ul><li><p>More than <strong>60%</strong> of all new fraud trends today are found through anomaly detection, and that share is growing as coverage expands.</p></li><li><p>Average time-to-detect new fraud trends dropped from <strong>more than 100 days</strong> to <strong>less than three days</strong> over the past year.</p></li><li><p>The platform saves <strong>tens of millions of dollars per year</strong> by flagging small but growing fraud trends before they get out of control.</p></li></ul><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://careersatdoordash.com/blog/doordash-anomaly-detection-platform-to-catch-fraud-trends">DoorDash's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-doordash-saves-tens-of-millions-a-year-by-detecting-fraud?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-doordash-saves-tens-of-millions-a-year-by-detecting-fraud?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;61ab4b4f-02dc-4ef8-a0eb-c6193f2bb650&quot;,&quot;caption&quot;:&quot;How do you handle search queries like &#8220;low-carb spicy chicken wrap with gluten-free tortilla&#8221; at scale?<br /><br />DoorDash rebuilt its search pipeline to better understand both user intent and product metadata. The result? A 30% increase in relevant results and measurable gains across key engagement metrics.<br /><br />This post breaks down the hybrid approach they used; combining LLMs, structured taxonomies and real-time retrieval without sacrificing speed or accuracy.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How DoorDash Used LLMs to Trigger 30% More Relevant Results&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-06-26T09:37:56.405Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!8K0n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f9be1c-e138-41d4-9596-b4cd02897f95_432x860.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-doordash-used-llms-to-trigger&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:166857110,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:7,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;b7387bad-e21d-43d0-9227-adbed3439e2b&quot;,&quot;caption&quot;:&quot;Behind every 'smart' answer is a chain of fallible steps: retrieval, ranking, prompting and others.<br /><br />Dropbox Dash turned that complexity into a testable, measurable system.<br /><br />Here&#8217;s how they made their evaluation as rigorous as code.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Dropbox Made AI Evaluation Work at Scale&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-10-09T07:14:50.996Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!oMNY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a58eb26-c9ac-492d-96f2-343a7f503ddc_800x450.gif&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-dropbox-made-ai-evaluation-work-at-scale&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:175671629,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Grab Detects Data Issues across 100+ Kafka Topics Before They Spread]]></title><description><![CDATA[Real-time stream validation surfaces poison records early and notifies owners with context]]></description><link>https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 15 Jan 2026 04:15:57 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1624957083543-9a67140fabfd?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today we will look at how Grab detects data issues in real-time. </p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Doc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Doc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Doc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Doc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!6Doc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f76efe7-9d16-4ec3-8e25-0e3d2282705b_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Grab&#8217;s real-time work!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="6000" height="4000" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:4000,&quot;width&quot;:6000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;man riding bicycle&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="man riding bicycle" title="man riding bicycle" srcset="https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1558899367-3cd83fb31ed8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxfHxncmFifGVufDB8fHx8MTc2NzkzMTIwOXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@javaistan">Afif Ramdhasuma</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Grab runs critical systems on Kafka streams, where bad data can spread and break downstream consumers. Existing checks were slow and mostly limited to schemas, making issues hard to catch and debug.</p><h4><strong>Task</strong></h4><p>Detect bad streaming data early, cover both schema and value-level issues and give stream owners fast, actionable visibility without centralising ownership.</p><h4><strong>Action</strong></h4><p>Grab built contract-driven stream checks on Coban, turning schemas, field rules and ownership into real-time FlinkSQL tests with Slack alerts and UI-based inspection of bad records.</p><h4><strong>Result</strong></h4><p>The system now monitors 100+ Kafka topics in real time, surfaces poison data quickly and helps teams stop issues before they cascade downstream.</p><h4><strong>Use Cases</strong></h4><p>Root cause analysis, real-time monitoring, real-time alerting</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Kafka, Apache Flink, Amazon S3, Slack, LLM</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4><strong>About Grab</strong></h4><p><a href="https://www.grab.com/">Grab</a> is often called the Uber of Southeast Asia but that might be selling it short. What started as a ride-hailing app now powers food delivery, groceries, payments and even insurance all bundled into one super app. They run across over 800 cities in 8 Southeast Asian countries. Behind the rides, meals, and payments lies an enormous stream of events flowing through Grab&#8217;s systems.</p><div><hr></div><h4>Background</h4><p>Grab runs a lot of business on streaming data. Kafka topics feed online systems, offline analytics and machine learning pipelines. When those streams are clean, life is good: teams can move faster, models behave, dashboards run smoothly. But when they&#8217;re not clean, it&#8217;s a major headache.</p><p>The tricky part is that &#8216;bad data&#8217; in Kafka isn&#8217;t always obvious. Sometimes it&#8217;s quiet: the stream still parses but key fields are wrong, missing or shaped differently than what downstream teams assume.</p><p>That&#8217;s why Grab decided to introduce a platform-level solution: Kafka stream contracts that let stream stakeholders define what &#8216;good&#8217; looks like, then automatically test streams in real time, catch issues as they happen and alert the owners quickly.</p><p>The core idea is simple:</p><ul><li><p>Let users define a data contract for a Kafka topic</p></li><li><p>Convert that contract into executable tests</p></li><li><p>Run those tests continuously</p></li><li><p>Capture the poison data plus context</p></li><li><p>Notify the right people with enough detail to act</p></li></ul><p>This supports a more decentralized, data-mesh style world where teams own their data products while still keeping the overall system reliable for everyone else.</p><div><hr></div><h4>What wasn&#8217;t working before</h4><p>Historically, monitoring Kafka stream data processing didn&#8217;t have a strong, end-to-end solution for data quality validation. That created three big issues: detecting bad data, speed of detection and lack of visibility.</p><p><strong>1- Detecting bad data</strong></p><p>This can be broken down into two further categories:</p><p><strong>1.1 Schema issues</strong></p><p>These are schema mismatches between producers and consumers that can trigger deserialization errors. Even if schema backward compatibility is validated during schema evolution, the data inside the Kafka topic can still drift from the defined schema.</p><p>One concrete example: a rogue producer writes to a topic without using the expected schema. Now you&#8217;ve got a topic that &#8216;has a schema&#8217; but real events don&#8217;t match it. The painful bit is not just knowing something broke, it&#8217;s identifying which fields are causing the mismatch.</p><p><strong>1.2 Rule and value issues</strong><br>These are disagreements about what a field <em>means</em> or what shape it should take. Kafka stream schemas define structure but they don&#8217;t enforce rules like:</p><ul><li><p>expected length for an identifier</p></li><li><p>expected string pattern</p></li><li><p>valid numeric ranges</p></li><li><p>constant values that should never change</p></li></ul><p>There wasn&#8217;t an existing framework where stakeholders could define and enforce field-level semantic rules for streams.</p><p><strong>2- Speed of detection</strong></p><p>The second issue was speed of detection. There was no real-time mechanism to automatically validate data against predefined rules, identify issues quickly and alert stakeholders promptly.</p><p>Without real-time validation, issues could stick around for a while, quietly impacting multiple online and offline downstream systems before being discovered.</p><p><strong>3- Lack of visibility</strong></p><p>Even when teams did detect a problem, it was hard to pinpoint the exact &#8216;poison data&#8217; and understand what violated the schema or the semantic expectations.</p><p>Root cause analysis becomes painful when you cannot easily answer:</p><ul><li><p>Which records were bad?</p></li><li><p>Which fields failed?</p></li><li><p>What did the bad values look like?</p></li><li><p>When did it start and how frequent is it?</p></li></ul><div><hr></div><h4>The fix</h4><p>Grab&#8217;s Coban platform provides a standardized, platform-level data quality testing and observability setup for Kafka streams. It&#8217;s built around four core ideas:</p><ol><li><p><strong>Data Contract Definition: </strong>Stream stakeholders define a contract that includes schema agreements, semantic rules the topic data must follow, and ownership metadata for alerts and notifications.</p></li><li><p><strong>Automated Test Execution: </strong>A long-running test runner automatically executes real-time tests based on that contract.</p></li><li><p><strong>Real-time Data Quality Issue Identification: </strong>The system detects data issues in real time at both schema and rules/values levels.</p></li><li><p><strong>Alerts and Result Observability: </strong>It alerts the right people and makes it easier to observe issues through the platform UI and downstream tooling.</p></li></ol><p>Put simply: define the rules once, then let the platform watch the stream continuously.</p><p>The architecture has three main components:</p><ol><li><p><strong>Data contract definition</strong></p></li><li><p><strong>Test execution and data quality issue identification</strong></p></li><li><p><strong>Result observability</strong></p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!opMG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!opMG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 424w, https://substackcdn.com/image/fetch/$s_!opMG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 848w, https://substackcdn.com/image/fetch/$s_!opMG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!opMG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!opMG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg" width="1456" height="543" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:543,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:224037,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!opMG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 424w, https://substackcdn.com/image/fetch/$s_!opMG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 848w, https://substackcdn.com/image/fetch/$s_!opMG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!opMG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5a043c-954d-4cfc-b21e-a3dfd346fe47_1991x743.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Real-time Kafka Stream Data Quality Monitoring Architecture (Source: Grab)</figcaption></figure></div><p>All Flow mentions after this refer to those diagrammed steps above</p><div><hr></div><h4><strong>Data contract definition</strong></h4><p>Coban&#8217;s contract acts as a formal agreement among Kafka stream stakeholders. It includes a few building blocks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KSXy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KSXy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KSXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg" width="836" height="758" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:758,&quot;width&quot;:836,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:120852,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KSXy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KSXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73bb2962-4b96-49b4-ad1d-7709b39753d7_836x758.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Grab</figcaption></figure></div><p><strong>Kafka Stream Schema (Flow 1.1)</strong></p><p>The contract includes the schema used by the Kafka topic under test. This helps the Test Runner validate schema compatibility across data streams.</p><p>Importantly, this is not only about &#8220;did the schema change.&#8221; It&#8217;s also about &#8220;does the data actually match what everyone believes the schema is.&#8221;</p><p><strong>Kafka Stream Configuration (Flow 1.2)</strong></p><p>This includes essential config like endpoint and topic name. Coban automatically populates this so users don&#8217;t have to wire everything manually.</p><p><strong>Observability Metadata (Flow 1.3)</strong></p><p>This is where ownership becomes real. The contract includes contact details for stream stakeholders and alert configurations so the right people get notified when issues show up.</p><p><strong>Kafka Stream Semantic Test Rules (Flow 1.5)</strong></p><p>This is the heart of the semantic side. Users can define intuitive field-level rules such as:</p><ul><li><p>string pattern checks</p></li><li><p>number range checks</p></li><li><p>constant value checks</p></li></ul><p>The point is to make the &#8220;meaning&#8221; of fields enforceable, not just their data types.</p><p><strong>LLM-Based Semantic Test Rules Recommendation (Flow 1.4)</strong></p><p>Defining dozens or hundreds of field rules can overwhelm people. To reduce that setup burden, Coban uses an LLM-based feature that recommends semantic test rules based on:</p><ul><li><p>the provided Kafka stream schema</p></li><li><p>anonymized sample data</p></li></ul><p>This feature helps users set up semantic rules efficiently, as demonstrated below</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pu8X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pu8X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 424w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 848w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 1272w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pu8X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png" width="1456" height="522" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:522,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:241835,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pu8X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 424w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 848w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 1272w, https://substackcdn.com/image/fetch/$s_!pu8X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff96422b2-5c1c-4f0a-8fb8-af92a409d344_1999x717.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sample UI showcasing LLM-based Kafka stream schema field-level semantic test rules (Source: Grab)</figcaption></figure></div><p>The practical benefit: users get a starting point quickly, instead of staring at a schema and trying to invent rules from scratch.</p><div><hr></div><h4><strong>Data contract transformation</strong></h4><p>Once a contract is defined, Coban&#8217;s transformation engine converts it into configurations the Test Runner can interpret (Flow 2.1).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nvEa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nvEa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nvEa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg" width="1122" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:1122,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:135025,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d4065d7-b4c5-4f78-8761-0addce18f606_1122x660.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nvEa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nvEa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa26d3201-18ad-4d6f-af2f-727f179fc238_1122x660.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Grab</figcaption></figure></div><p>This transformation covers four things:</p><p><strong>Kafka Stream Schema: </strong>The contract schema is translated into a schema reference format the Test Runner can parse.</p><p><strong>Kafka Stream Configuration: </strong>The Kafka stream is set up as a source for the Test Runner.</p><p><strong>Observability metadata: </strong>Contact information is turned into runtime configs for alerting and routing.</p><p><strong>Kafka Stream Semantic Test Rules: </strong>Human-readable semantic rules are transformed into an <strong>inverse SQL query</strong> that captures data violating the rules.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SeoF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SeoF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SeoF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg" width="1456" height="815" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:815,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:213548,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SeoF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SeoF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00cfa6bd-52be-4143-a79d-caeb71276749_1972x1104.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Illustration of semantic test rules being converted from human-readable formats into inverse SQL queries (Source: Grab)</figcaption></figure></div><p>&#8216;Inverse SQL&#8217; here means the query is designed to return the <em>bad rows</em>, not the good ones. That&#8217;s a smart design choice because it keeps the output focused on what needs investigation.</p><div><hr></div><h4>Test execution &amp; data quality issue identification</h4><p>Once the transformation engine generates the configuration, the platform automatically deploys the Test Runner.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y-bs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y-bs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg" width="1010" height="734" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:734,&quot;width&quot;:1010,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96110,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8dc273-8996-4ec1-a825-41a85d232746_1010x734.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y-bs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Y-bs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd8212e-0700-4411-be3d-384ee376c7ec_1010x734.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Grab</figcaption></figure></div><p><strong>Test runner</strong></p><p>The Test Runner uses FlinkSQL as its compute engine. FlinkSQL was chosen because it makes defining rules straightforward using SQL statements, which also makes it easier for the platform to convert contracts into enforceable checks.</p><p><strong>Test execution workflow and problematic data identification</strong></p><p>Below are the 4 steps undertaken to execute the test and identify problematic data:</p><ol><li><p><strong>Consume Kafka data (Flow 2.2)</strong><br>FlinkSQL consumes data from the Kafka topic under test using its own consumer group. This is important because it avoids impacting other consumers.</p></li><li><p><strong>Run inverse SQL (Flow 2.3)</strong><br>The Test Runner runs the inverse SQL query to identify:</p><ul><li><p>data that violates semantic rules</p></li><li><p>data that is syntactically incorrect &#8220;in the first place&#8221;</p></li></ul></li><li><p><strong>Publish data quality issue events (Flow 3.2)</strong><br>When bad data is found, the Test Runner packages it into a data quality issue event enriched with:</p><ul><li><p>a test summary</p></li><li><p>total count of bad records</p></li><li><p>sample bad data</p></li></ul><p>Then it publishes the event to a dedicated Kafka topic.</p></li><li><p><strong>Sink events to S3 (Flow 3.1)</strong><br>The platform also sinks all data quality events to an AWS S3 bucket for deeper observability and analysis.</p></li></ol><p>This combo (Kafka for realtime events, S3 for deeper inspection) gives both fast alerting and a more durable store for later analysis.</p><div><hr></div><h4>Result observability</h4><p>Grab&#8217;s in-house data quality observability platform, Genchi, consumes the problematic data captured by the Test Runner (Flow 3.3).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n2A8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n2A8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 424w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 848w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n2A8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg" width="838" height="618" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:618,&quot;width&quot;:838,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:64056,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n2A8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 424w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 848w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!n2A8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b376a29-23b4-420d-b72c-def7101963c5_838x618.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Grab</figcaption></figure></div><p><strong>Alerting</strong></p><p>Genchi sends Slack notifications to stream owners listed in the contract&#8217;s observability metadata (Flow 3.5).</p><p>Those notifications include useful debugging context such as:</p><ul><li><p>links to sample data in the Coban UI</p></li><li><p>observed time windows</p></li><li><p>counts of bad records</p></li><li><p>other relevant details</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!avzo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!avzo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 424w, https://substackcdn.com/image/fetch/$s_!avzo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 848w, https://substackcdn.com/image/fetch/$s_!avzo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 1272w, https://substackcdn.com/image/fetch/$s_!avzo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!avzo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png" width="1314" height="478" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f156f000-5325-4c18-b58c-1987c5cac707_1314x478.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:478,&quot;width&quot;:1314,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:104689,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!avzo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 424w, https://substackcdn.com/image/fetch/$s_!avzo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 848w, https://substackcdn.com/image/fetch/$s_!avzo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 1272w, https://substackcdn.com/image/fetch/$s_!avzo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff156f000-5325-4c18-b58c-1987c5cac707_1314x478.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sample Slack notifications (Source: Grab)</figcaption></figure></div><p>The key point is that alerts are not just &#8216;something broke&#8217;, they include the information you need to start investigating.</p><p><strong>Observability</strong></p><p>Users can access the Coban UI (Flow 3.4) to see:</p><ul><li><p>Kafka stream test rules</p></li><li><p>sample bad records</p></li><li><p>highlighted fields and values that violate rules</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iqrn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iqrn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 424w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 848w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iqrn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg" width="1456" height="456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:108090,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183755897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iqrn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 424w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 848w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!iqrn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d9e3d9b-5030-4a02-b677-8bd277354196_1551x486.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The highlighted fields indicate violations of the semantic test rules (Source: Grab)</figcaption></figure></div><p>That UI piece matters because it shortens the path from &#8216;alert received&#8217; to &#8216;I know what field is failing and what the bad values look like.&#8217;</p><div><hr></div><h4>Results so far</h4><p>Since deploying earlier in the year, this solution enabled Kafka stream users to:</p><ul><li><p>define contracts with both schema and semantic rules</p></li><li><p>automate real-time test execution</p></li><li><p>alert stakeholders when problematic data is detected so they can act quickly</p></li></ul><p>It has been actively monitoring data quality across <strong>100+ critical Kafka topics</strong>.</p><p>The solution also offers the capability to immediately identify and halt the propagation of invalid data across multiple streams.</p><div><hr></div><h4>Wrapping up</h4><p>Grab implemented and rolled out a real-time data quality monitoring solution for Kafka streams through the Coban platform.</p><p>The key outcomes include:</p><ul><li><p>engineers can define syntactic and semantic tests through a data contract</p></li><li><p>tests run automatically in real time via a long-running Test Runner based on FlinkSQL</p></li><li><p>issues trigger fast Slack alerts through Genchi using ownership metadata in the contract</p></li><li><p>teams get better visibility into exactly which data fields violate rules via the Coban UI</p></li></ul><p>In short: Coban turned data quality from a vague hope into something stream owners can specify, enforce and observe in real time.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://engineering.grab.com/real-time-data-quality-monitoring">Grab's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-grab-detects-data-issues-across-100-kafka-topics?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2b5e61e3-2de5-4088-981d-80de61411bd4&quot;,&quot;caption&quot;:&quot;Uber rebuilt its data lake ingestion to move freshness from hours to minutes.<br /><br />This piece breaks down how they replaced batch Spark jobs with Flink streaming, cut compute by 25% and dealt with the very real problems that show up at petabyte scale.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Uber Cut Data Lake Freshness From Hours to Minutes With Flink&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-02T04:30:31.300Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:182833470,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:17,&quot;comment_count&quot;:1,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1904c23f-5462-4150-9c60-b6ad712234b6&quot;,&quot;caption&quot;:&quot;How do you keep ML teams fast when every experiment blasts your Spark cluster with spiky workloads, huge datasets and five different file formats?<br /><br />Snap&#8217;s answer: Prism, a unified Spark platform that hides infra pain, standardises pipelines and supports everything from ad-hoc exploration to 10k+ daily jobs in production.<br /><br />This post breaks down why raw Spark wasn&#8217;t enough, what Prism fixes and how Snap rebuilt their ML data layer without ditching Spark.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Snap Rebuilt Its ML Platform to Handle 10,000+ Daily Spark Jobs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-20T04:59:47.340Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:179211962,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[What the Data Crowd Was Reading in December 2025]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in December 2025.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-december-2025</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-december-2025</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 08 Jan 2026 05:01:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/29125fa4-9a37-40a2-a85c-c795fb77137f_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers,</p><p>It&#8217;s time for another round-up on all things data and AI!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4SpS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4SpS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!4SpS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!4SpS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!4SpS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4SpS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4SpS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!4SpS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!4SpS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!4SpS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a950004-135e-4245-99de-ab58d8aac3c1_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for December!</p><div><hr></div><h3>Data science &amp; AI</h3><ul><li><p><strong><a href="https://magazine.sebastianraschka.com/p/state-of-llms-2025?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The State Of LLMs 2025: Progress, Problems, and Predictions</a> (34 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Sebastian Raschka, PhD&quot;,&quot;id&quot;:27393275,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F61f4c017-506f-4e9b-a24f-76340dad0309_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;eb6310b4-4c18-45bc-a961-994bb151f1ac&quot;}" data-component-name="MentionToDOM"></span> provides a great recap of main developments in 2025 and a couple of predictions for 2026 (like classical RAG slowly fading away)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XAEE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XAEE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 424w, https://substackcdn.com/image/fetch/$s_!XAEE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 848w, https://substackcdn.com/image/fetch/$s_!XAEE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 1272w, https://substackcdn.com/image/fetch/$s_!XAEE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XAEE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp" width="1456" height="892" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:892,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:29522,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XAEE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 424w, https://substackcdn.com/image/fetch/$s_!XAEE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 848w, https://substackcdn.com/image/fetch/$s_!XAEE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 1272w, https://substackcdn.com/image/fetch/$s_!XAEE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c719740-3865-4e4a-a7de-b6a296df733c_1456x892.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://read.futureproofds.com/p/building-a-data-cleaning-agent-with-langgraph?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Building a Data Cleaning Agent with LangGraph</a> (7 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Andres Vourakis&quot;,&quot;id&quot;:135808578,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ebf6fc-4ed6-47e1-938e-a1fa37a2347e_1601x1646.jpeg&quot;,&quot;uuid&quot;:&quot;c2d6e3c0-dced-406a-a9d3-3c6a5205ac34&quot;}" data-component-name="MentionToDOM"></span> shows how to build a LangGraph-based data cleaning agent that auto-generates, executes, and fixes Python cleaning code to cut down manual data prep.</p></li><li><p><strong><a href="https://www.leoniemonigatti.com/blog/memory-in-ai-agents.html?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Making Sense of Memory in AI Agents</a></strong> <strong>(10 minute read)<br></strong>This post breaks down how different memory types (short-term, long-term, and structured) let AI agents retain context across steps so they can act coherently instead of responding statelessly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OOyv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OOyv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 424w, https://substackcdn.com/image/fetch/$s_!OOyv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 848w, https://substackcdn.com/image/fetch/$s_!OOyv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 1272w, https://substackcdn.com/image/fetch/$s_!OOyv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OOyv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png" width="1455" height="1009" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1009,&quot;width&quot;:1455,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:425978,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OOyv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 424w, https://substackcdn.com/image/fetch/$s_!OOyv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 848w, https://substackcdn.com/image/fetch/$s_!OOyv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 1272w, https://substackcdn.com/image/fetch/$s_!OOyv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24aa3bd7-be79-41eb-a754-1a5fa8ceaedc_1455x1009.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://towardsdatascience.com/exploring-tabpfn-a-foundation-model-built-for-tabular-data/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Exploring TabPFN: A Foundation Model Built for Tabular Data</a> (12 minute read)<br></strong>The article explores TabPFN, a foundation model pretrained on synthetic tasks that delivers strong tabular ML performance with near-zero tuning by reframing tabular prediction as conditional inference.</p></li><li><p><strong><a href="https://towardsdatascience.com/how-to-use-simple-data-contracts-in-python-for-data-scientists?utm_source=datatinkerer.io&amp;utm_medium=newsletter">How to Use Simple Data Contracts in Python for Data Scientists</a> (8 minute read)<br></strong>Eirik Berge walks through a lightweight data contract implementation in Python to catch schema breakages early</p></li><li><p><strong> <a href="https://www.anthropic.com/research/how-ai-is-transforming-work-at-anthropic">How AI is transforming work at Anthropic</a> (35 minute read)<br></strong>An interesting look at how engineers and researchers at Anthropic actually use AI day to day and which parts of their work it genuinely helps with.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UkqY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UkqY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 424w, https://substackcdn.com/image/fetch/$s_!UkqY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 848w, https://substackcdn.com/image/fetch/$s_!UkqY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 1272w, https://substackcdn.com/image/fetch/$s_!UkqY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UkqY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:52637,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UkqY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 424w, https://substackcdn.com/image/fetch/$s_!UkqY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 848w, https://substackcdn.com/image/fetch/$s_!UkqY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 1272w, https://substackcdn.com/image/fetch/$s_!UkqY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d9cfb90-76ea-431a-bc72-01373989c923_3840x2160.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul><ul><li><p><strong><a href="https://vercel.com/blog/we-removed-80-percent-of-our-agents-tools?utm_source=datatinkerer.io&amp;utm_medium=newsletter">We removed 80% of our agent&#8217;s tools</a> (4 minute read)<br></strong>Vercel rebuilt their text-to-SQL agent by stripping away complex tooling and giving Claude direct file-system access, discovering that fewer tools, better documentation and &#8216;doing less&#8217; made the agent faster, cheaper and more reliable.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0zIw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0zIw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 424w, https://substackcdn.com/image/fetch/$s_!0zIw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 848w, https://substackcdn.com/image/fetch/$s_!0zIw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 1272w, https://substackcdn.com/image/fetch/$s_!0zIw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0zIw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png" width="650" height="325" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/321887da-7beb-4a82-987f-df165d353918_650x325.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:325,&quot;width&quot;:650,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26756,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0zIw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 424w, https://substackcdn.com/image/fetch/$s_!0zIw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 848w, https://substackcdn.com/image/fetch/$s_!0zIw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 1272w, https://substackcdn.com/image/fetch/$s_!0zIw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321887da-7beb-4a82-987f-df165d353918_650x325.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul><div><hr></div><h3>Data engineering</h3><ul><li><p><strong><a href="https://www.ssp.sh/blog/omakase-data-stack/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Opinionated Data Platforms vs. Open-Source</a> (18 minute read)<br></strong>Good article by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Simon Sp&#228;ti&quot;,&quot;id&quot;:27855874,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6fc84efb-1b87-4fb3-bfb1-076664f32de4_2199x2199.jpeg&quot;,&quot;uuid&quot;:&quot;81c59bed-93c1-4b47-8b9b-26bc07109243&quot;}" data-component-name="MentionToDOM"></span> breaking down the tradeoffs between open-source and &#8216;opinionated&#8217; and when it makes sense to go for the latter.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!V-4O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V-4O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 424w, https://substackcdn.com/image/fetch/$s_!V-4O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 848w, https://substackcdn.com/image/fetch/$s_!V-4O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 1272w, https://substackcdn.com/image/fetch/$s_!V-4O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V-4O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png" width="1456" height="731" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:731,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1018856,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!V-4O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 424w, https://substackcdn.com/image/fetch/$s_!V-4O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 848w, https://substackcdn.com/image/fetch/$s_!V-4O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 1272w, https://substackcdn.com/image/fetch/$s_!V-4O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0fe0f54-4b27-4634-aacb-6f48c4e8a983_2523x1267.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://dataengineeringcentral.substack.com/p/llms-for-pdf-data-pipelines?utm_source=datatinkerer.io&amp;utm_medium=newsletter">LLMs for {PDF} Data Pipelines</a> (8 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Daniel Beach&quot;,&quot;id&quot;:21715962,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F81caaeec-9053-487c-a59c-ba5f8e4644ad_256x256.jpeg&quot;,&quot;uuid&quot;:&quot;a6ac8240-80c4-4811-b274-3c28f4f96ad7&quot;}" data-component-name="MentionToDOM"></span> experiments with using LLMs as part of a data pipeline, showing that agent-style PDF-to-JSON extraction can work in practice despite slowness and may be good enough for real-world automation.</p></li><li><p><strong><a href="https://seattledataguy.substack.com/p/snowflake-vs-databricks-is-the-wrong?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Snowflake vs Databricks Is the Wrong Debate</a> (9 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;SeattleDataGuy&quot;,&quot;id&quot;:4963622,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec905aa-9a7b-4f21-b0ff-fec92e8916d1_512x512.jpeg&quot;,&quot;uuid&quot;:&quot;b95e0031-6145-49e2-aae2-1898e3288485&quot;}" data-component-name="MentionToDOM"></span> argues that the Snowflake vs Databricks debate is a distraction, with Databricks deliberately expanding role by role to own the full data stack and compete with cloud and enterprise platforms.</p></li><li><p><strong><a href="https://pipeline2insights.substack.com/p/data-quality-design-patterns-wap-awap?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Data Quality Design Patterns</a> (11 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Erfan Hesami&quot;,&quot;id&quot;:277538242,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!rcW2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2692f-48e0-43a5-9f33-7eebb007bd6e_1641x1641.jpeg&quot;,&quot;uuid&quot;:&quot;024d3eec-b14c-4f30-92fd-7a250d572cdd&quot;}" data-component-name="MentionToDOM"></span> breaks down practical data quality design patterns like WAP, AWAP, TAP and signal tables, showing how teams balance safety, cost and speed to keep bad data out of production pipelines.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2TeV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2TeV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 424w, https://substackcdn.com/image/fetch/$s_!2TeV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 848w, https://substackcdn.com/image/fetch/$s_!2TeV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 1272w, https://substackcdn.com/image/fetch/$s_!2TeV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2TeV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp" width="1423" height="358" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:358,&quot;width&quot;:1423,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14380,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2TeV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 424w, https://substackcdn.com/image/fetch/$s_!2TeV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 848w, https://substackcdn.com/image/fetch/$s_!2TeV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 1272w, https://substackcdn.com/image/fetch/$s_!2TeV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F625b6fd0-4fe7-447b-9a11-a2dc9c44e621_1423x358.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://thepipeandtheline.substack.com/p/duckdb-the-swiss-army-knife-for-data?utm_source=datatinkerer.io&amp;utm_medium=newsletter">DuckDB: The Swiss Army Knife For Data Engineers</a> (8 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Alejandro Aboy&quot;,&quot;id&quot;:22949723,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!u1Ao!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca2c63d-9f5e-4cd3-99ac-7d8e71dc114b_1024x1024.jpeg&quot;,&quot;uuid&quot;:&quot;f97d74b7-0d12-49b2-929c-072c0400b984&quot;}" data-component-name="MentionToDOM"></span> argues that DuckDB can replace most pandas, Spark, and Airflow workflows by letting data engineers run fast, scalable analytics and ETL directly with SQL, zero infrastructure and minimal complexity.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs">How Snap Rebuilt Its ML Platform to Handle 10,000+ Daily Spark Jobs</a> (14 minute read)<br></strong>This post<strong> </strong>breaks down how Snap rebuilt its ML platform with a unified Spark layer to tame spiky workloads, standardise pipelines, and reliably run 10,000+ production jobs a day without blowing up clusters.</p></li></ul><div><hr></div><h3>Data analysis and visualisation</h3><ul><li><p><strong><a href="https://www.scientificdiscovery.dev/p/salonis-guide-to-data-visualization?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Saloni&#8217;s guide to data visualization</a> (41 minute read)<br></strong>Great and comprehensive post by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Saloni Dattani&quot;,&quot;id&quot;:4267654,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3bc76721-fe9b-4edc-bd5b-de3869518c08_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;8ad6a4c6-c6ca-4229-b0cd-fc658c40e0ad&quot;}" data-component-name="MentionToDOM"></span> where she distils data visualization down to first principles, showing how to choose charts, reduce clutter and design visuals that communicate insight instead of just decorating dashboards.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vUTS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vUTS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 424w, https://substackcdn.com/image/fetch/$s_!vUTS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 848w, https://substackcdn.com/image/fetch/$s_!vUTS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 1272w, https://substackcdn.com/image/fetch/$s_!vUTS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vUTS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp" width="1456" height="1447" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1447,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:160262,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/183495145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vUTS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 424w, https://substackcdn.com/image/fetch/$s_!vUTS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 848w, https://substackcdn.com/image/fetch/$s_!vUTS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 1272w, https://substackcdn.com/image/fetch/$s_!vUTS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F568c84ce-9e5f-41c5-bda2-5f1e666ebc59_1456x1447.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong><a href="https://dominicroye.github.io/blog/2025-12-14-broken-charts/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Broken Chart: discover 9 visualization alternatives</a> (10 minute read)<br></strong>Dominic Roy&#233; breaks down how common chart design mistakes distort interpretation, showing why many broken charts mislead viewers and how to fix them with clearer scales, context, and visual discipline.</p></li></ul><div><hr></div><h3><strong>Other interesting reads</strong></h3><ul><li><p><strong><a href="https://sqlpatterns.com/p/the-most-powerful-timeless-skill?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The Most Useful, Timeless Skill to Learn as a Data Professional</a> (11 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Ergest Xheblati&quot;,&quot;id&quot;:245231,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60c1d86e-f97d-4deb-8991-0662b9a07922_1024x1536.png&quot;,&quot;uuid&quot;:&quot;0790edb2-ac76-4182-89e9-6124a782d1dc&quot;}" data-component-name="MentionToDOM"></span> makes the case that real impact in data comes from using leverage and not just more lines of code</p></li><li><p><strong><a href="https://joereis.substack.com/p/2026-general-thoughts-on-whats-ahead?utm_source=datatinkerer.io&amp;utm_medium=newsletter">2026 - General Thoughts on What&#8217;s Ahead</a> (6 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Joe Reis&quot;,&quot;id&quot;:3531217,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e4716b1-c223-41e3-b943-def0291bf217_1175x783.jpeg&quot;,&quot;uuid&quot;:&quot;d09c89e5-6573-494f-8c0d-f91c443c3063&quot;}" data-component-name="MentionToDOM"></span> thinks that 2026 will be a deliberately &#8216;boring&#8217; year where AI hype cools off and teams are forced to focus on fundamentals that actually make AI work.</p></li><li><p><strong><a href="https://wrongbutuseful.substack.com/p/the-next-data-bottleneck?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The next data bottleneck</a> (11 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Katie Bauer&quot;,&quot;id&quot;:5505029,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/25c27590-5e07-44b1-958d-0aaa70195e65_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;bf47ca09-d373-4704-86a7-af9669aab428&quot;}" data-component-name="MentionToDOM"></span> argues that as tools and models get better, the real constraint shifts to human bottlenecks like decision-making, ownership and organisational ability to turn data into action.</p></li></ul><div><hr></div><h3><strong>Quick favor - need your take</strong></h3><div class="poll-embed" data-attrs="{&quot;id&quot;:428078}" data-component-name="PollToDOM"></div><p><strong>Was there any standout article or topic from November I missed? Feel free to drop a comment or hit reply, even a quick line helps.</strong></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;738bc346-aa09-4943-a656-9c97ecf88686&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from November&#128071;<br /><br />Data Science &amp;amp; AI<br />Context engineering becomes the real bottleneck for AI agents<br />Classic algorithms still beat most enterprise AI in ROI<br />A practical framework to identify true agentic use cases<br />Gemini 3 benefits from direct structured prompting<br /><br />Data Engineering<br />DuckLake revives relational metadata for lakehouses<br />Event streaming hits market saturation<br />Real-world consulting lessons point to simpler pipelines over hype<br />Dark data hoarding kills AI signal<br /><br />Data Analysis &amp;amp; BI<br />Dashboard testing gets a full end-to-end checklist<br />Guidance on balancing accuracy vs speed when answering business questions.<br /><br />Plus: AI-coded &#8220;good enough&#8221; apps shift the buy-vs-build boundary, low-tech industries become prime AI adopters as margins flip and new benchmark analysis suggests model performance is mostly general capability with a smaller &#8220;Claudiness&#8221; axis on top.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in November 2025&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-12-03T07:52:29.847Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c31550f6-1fdf-4738-b384-2eeb55f71662_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-november-2025&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:180567973,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:13,&quot;comment_count&quot;:3,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;27385512-431a-4d66-acc8-78b85b942c01&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from October&#128071;<br /><br />Data Science &amp;amp; AI<br />How Gradient Descent Works<br />Recursive Language Models<br />The Continual Learning Problem<br />Why Analytics Agents Break Differently<br /><br />Data Engineering<br />How Kafka Works<br />Data Modeling for the Agentic Era<br />You&#8217;ll Never Have a FAANG Data Infrastructure<br />Getting Started with OpenMetadata<br /><br />Data Analysis &amp;amp; BI<br />Jobs-to-be-Done: Designing dashboards for what users need to achieve.<br />From Dental Cleaning to Data Cleaning: Pivoting into healthcare analytics.<br /><br />Plus: Real AI Agents and Real Work, Taking the Bitter Lesson Seriously: Let AI optimize compute, not humans, OpenAI Is a Consumer Company, Import AI 431: Technological optimism meets appropriate fear&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in October 2025&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-06T07:22:24.105Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a00481e6-bc3b-4419-9304-ed408b193853_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:178132882,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:12,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Uber Cut Data Lake Freshness From Hours to Minutes With Flink]]></title><description><![CDATA[Why Uber moved ingestion from Spark batch to Flink streaming and what it took to run thousands of jobs reliably at petabyte scale.]]></description><link>https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Fri, 02 Jan 2026 04:30:31 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Uber moved from batch to streaming in their data lake.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!05-P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!05-P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!05-P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!05-P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!05-P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!05-P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!05-P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!05-P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!05-P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!05-P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2affdf-e44f-452e-b02a-93809ad29f60_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Uber&#8217;s streaming solution</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="4000" height="6000" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:6000,&quot;width&quot;:4000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;person holding black iphone 5&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="person holding black iphone 5" title="person holding black iphone 5" srcset="https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1615929362369-4af8423ebd68?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNHx8dWJlcnxlbnwwfHx8fDE3NjcwNzc2NDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@tingeyinjurylawfirm">Tingey Injury Law Firm</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Batch-based ingestion meant data freshness was hours to days, slowing experimentation, analytics and ML across Uber&#8217;s core business domains.</p><h4><strong>Task</strong></h4><p>Move ingestion to minutes-level freshness at petabyte scale while lowering compute cost and keeping operations reliable across thousands of datasets.</p><h4><strong>Action</strong></h4><p>Built IngestionNext using Flink streaming from Kafka to Hudi, plus a control plane for operating ingestion at scale. Solved streaming bottlenecks (small files, partition skew, checkpoint vs commit alignment) to keep performance and correctness intact.</p><h4><strong>Result</strong></h4><ul><li><p>Freshness improved from hours to <strong>minutes-level</strong>.</p></li><li><p>Compute usage reduced by <strong>~25%</strong> vs batch ingestion.</p></li><li><p>Compaction performance improved by <strong>~10x</strong> with row-group merging.</p></li></ul><h4><strong>Use Cases</strong></h4><p>Near-real time analytics, personalisation, operational analytics</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Spark, Apache Kafka, Apache Flink, Apache Hudi, Apache Parquet</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Why data freshness became a platform priority at Uber?</h4><p>Uber&#8217;s data lake sits underneath a lot of the company&#8217;s analytics and machine learning. If a team wants to measure an experiment, monitor performance, train a model or sanity-check a business change, it usually starts with is the data in the lake yet?</p><p>Historically, ingestion into the lake was batch-based. Freshness was measured in hours. That was fine when decisions moved at daily report speed. It starts to hurt when the business wants near-real-time loops: faster experiments, faster model iteration, faster detection of issues.</p><p>Over the past year, the team built and validated <strong>IngestionNext</strong>, a new ingestion system that switches the default mindset from batch to streaming. It&#8217;s centered on Apache Flink, reads events from Kafka, writes to the data lake in Apache Hudi format and operates at petabyte scale. Along the way, they had to solve the stuff that makes streaming annoying in practice: small files, partition skew, checkpoint vs commit alignment and the operational problem of running thousands of jobs reliably.</p><div><hr></div><h4><strong>Why batch ingestion became a bottleneck?</strong></h4><p>Two main reasons: <strong>freshness</strong> and<strong> efficiency</strong>.</p><p><strong>Freshness</strong></p><p>As the business sped up, teams across Delivery, Rider, Mobility, Finance and Marketing Analytics kept asking the same thing: &#8220;Can we get the data sooner?&#8221;</p><p>Batch ingestion creates delays measured in hours and sometimes days. That lag slows down iteration and decision-making. In a world of continuous experimentation and fast model cycles, hours of latency is basically a tax on everything.</p><p>By moving ingestion to Flink-based streaming, the team reduced freshness from hours to minutes. That directly supports faster model launches, quicker experiments and more accurate analytics because the lake stays closer to what&#8217;s happening now.</p><p><strong>Efficiency</strong></p><p>Batch ingestion with Apache Spark is heavy by nature. Jobs run on a schedule, kick off distributed work at fixed intervals and keep doing that even when the workload is uneven. At Uber&#8217;s scale, with thousands of datasets and hundreds of petabytes, that adds up to hundreds of thousands of CPU cores running daily.</p><p>Streaming smooths this out. Instead of repeatedly spinning up large batch work, resources can scale with traffic in a more continuous way. Less overhead from scheduling, less big bang compute and more efficient usage overall.</p><div><hr></div><h4><strong>IngestionNext: A streaming ingestion platform built for scale</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AYPR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AYPR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 424w, https://substackcdn.com/image/fetch/$s_!AYPR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 848w, https://substackcdn.com/image/fetch/$s_!AYPR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 1272w, https://substackcdn.com/image/fetch/$s_!AYPR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AYPR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif" width="768" height="349" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:349,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15051,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AYPR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 424w, https://substackcdn.com/image/fetch/$s_!AYPR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 848w, https://substackcdn.com/image/fetch/$s_!AYPR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 1272w, https://substackcdn.com/image/fetch/$s_!AYPR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f26c0c-b910-4450-916a-c2e372d9c9fa_768x349.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">IngestionNext architecture (Source: Uber)</figcaption></figure></div><p>At the data plane, events land in Apache Kafka. Flink jobs consume those events and write them into the data lake using Apache Hudi. Hudi provides transactional behavior like commits, rollbacks and time travel. Freshness and completeness are measured end-to-end from source to sink, not just &#8220;did the job run.&#8221;</p><p>Operating ingestion at this scale is not a set it and forget it situation. So the team built a control plane focused on automation and safety. It manages the ingestion job lifecycle (create, deploy, restart, stop, delete), handles config changes and runs health verification. The goal is simple: run thousands of ingestion jobs consistently without turning the platform into a giant manual babysitting exercise.</p><p>The system also supports regional failover and fallback strategies. If there&#8217;s an outage, ingestion can shift across regions. If needed, jobs can temporarily fall back to batch mode so ingestion stays available and data is not lost.</p><div><hr></div><h4><strong>Solving the hard parts of streaming ingestion</strong></h4><p>Streaming buys freshness but it also introduces new failure modes. The team highlighted three major ones: <strong>small files</strong>, <strong>partition skew</strong> and <strong>checkpoint/commit synchronization</strong>.</p><p><strong>Small files</strong></p><p>Streaming writes data continuously. That tends to create lots of small Parquet files. Small files are a classic way to make query performance worse while also increasing metadata and storage overhead. You get fresher data, then you pay for it every time someone queries.</p><p>The common compaction approach merges Parquet files record by record. That means each file gets decompressed, decoded from columnar format into rows, merged, then encoded and compressed again. It works but it&#8217;s expensive and slow because you keep doing encode/decode work over and over.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!25HV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!25HV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 424w, https://substackcdn.com/image/fetch/$s_!25HV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 848w, https://substackcdn.com/image/fetch/$s_!25HV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 1272w, https://substackcdn.com/image/fetch/$s_!25HV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!25HV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif" width="768" height="527" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:527,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31057,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!25HV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 424w, https://substackcdn.com/image/fetch/$s_!25HV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 848w, https://substackcdn.com/image/fetch/$s_!25HV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 1272w, https://substackcdn.com/image/fetch/$s_!25HV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8bb140-efab-484a-be52-7a93cf9214ba_768x527.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Parquet file merging row by row (Source: Uber)</figcaption></figure></div><p>To fix this, the team introduced row-group-level merging. Instead of dropping down into row format, the merge operates directly on Parquet&#8217;s native columnar structure. That avoids the expensive recompression path and improves compaction performance by more than an order of magnitude, around 10x.</p><p>There are open-source efforts exploring schema-evolution-aware merging using padding and masking to align schemas but that comes with added implementation complexity and maintenance risk.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eGhg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eGhg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 424w, https://substackcdn.com/image/fetch/$s_!eGhg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 848w, https://substackcdn.com/image/fetch/$s_!eGhg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 1272w, https://substackcdn.com/image/fetch/$s_!eGhg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eGhg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif" width="768" height="347" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:347,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5298,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eGhg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 424w, https://substackcdn.com/image/fetch/$s_!eGhg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 848w, https://substackcdn.com/image/fetch/$s_!eGhg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 1272w, https://substackcdn.com/image/fetch/$s_!eGhg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2dfa45-17b0-4679-9e59-439fe164f2d7_768x347.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Row-group merging with data masking (Source: Uber)</figcaption></figure></div><p>So the team took a simpler path: enforce schema consistency during merging. Only files with identical schema are merged together. No masking, no low-level code modifications, less engineering overhead and still faster, more efficient and more reliable compaction.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vAV6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vAV6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 424w, https://substackcdn.com/image/fetch/$s_!vAV6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 848w, https://substackcdn.com/image/fetch/$s_!vAV6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 1272w, https://substackcdn.com/image/fetch/$s_!vAV6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vAV6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif" width="768" height="375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:375,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6507,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vAV6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 424w, https://substackcdn.com/image/fetch/$s_!vAV6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 848w, https://substackcdn.com/image/fetch/$s_!vAV6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 1272w, https://substackcdn.com/image/fetch/$s_!vAV6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3c4bbe0-5d7e-4e7e-ac99-706f52a1f0e9_768x375.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Simplified row-group merging by groping schema (Source: Uber)</figcaption></figure></div><p><strong>Partition skew</strong></p><p>Streaming ingestion depends on steady consumption from Kafka across Flink subtasks. The messy reality is that short-lived downstream slowdowns, like garbage collection pauses can unbalance consumption. Some partitions get read more than others. You end up with skew.</p><p>Skew doesn&#8217;t just look ugly on a dashboard. It can reduce compression efficiency and lead to slower queries downstream.</p><p>The fixes came from three angles:</p><ul><li><p><strong>Operational tuning:</strong> aligning Flink parallelism with Kafka partitions and adjusting fetch parameters.</p></li><li><p><strong>Connector-level fairness:</strong> adding mechanisms like round-robin polling, pause/resume for heavy partitions and per-partition quotas.</p></li><li><p><strong>Observability:</strong> exposing per-partition lag metrics, adding skew-aware autoscaling and setting targeted alerts.</p></li></ul><p>This is a good reminder that streaming issues often show up first as weird lag and then become why are queries slower now&#8221; If you can&#8217;t see skew clearly, you&#8217;ll chase symptoms forever.</p><p><strong>Checkpoint and commit synchronization</strong></p><p>Flink and Hudi each track progress but they track different things.</p><ul><li><p><strong>Flink checkpoints</strong> track consumed offsets.</p></li><li><p><strong>Hudi commits</strong> track writes.</p></li></ul><p>If failures happen and these drift out of sync, the system can skip data or duplicate it. In ingestion, either outcome is a serious problem.</p><p>The team solved this by extending Hudi commit metadata to embed Flink checkpoint IDs. With that linkage, recovery becomes deterministic during rollbacks or failovers. The system can reason about which checkpoint corresponds to which commit and recover without guessing.</p><div><hr></div><h4><strong>Production results: faster data with lower cost</strong></h4><p>The team onboarded datasets to the Flink-based ingestion platform and validated performance on some of Uber&#8217;s largest datasets.</p><p>The early results:</p><ul><li><p><strong>Freshness:</strong> improved from hours to <strong>minutes-level freshness</strong>.</p></li><li><p><strong>Efficiency:</strong> <strong>25% reduction in compute usage</strong> compared to batch ingestion.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HbzO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HbzO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 424w, https://substackcdn.com/image/fetch/$s_!HbzO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 848w, https://substackcdn.com/image/fetch/$s_!HbzO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 1272w, https://substackcdn.com/image/fetch/$s_!HbzO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HbzO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif" width="768" height="210" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:210,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7326,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/182833470?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HbzO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 424w, https://substackcdn.com/image/fetch/$s_!HbzO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 848w, https://substackcdn.com/image/fetch/$s_!HbzO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 1272w, https://substackcdn.com/image/fetch/$s_!HbzO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e2433c-5ac8-447c-b7de-9562a68daebd_768x210.avif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Before and after streaming ingestion (Source: Uber)</figcaption></figure></div><div><hr></div><h4><strong>Extending real-time beyond ingestion</strong></h4><p>IngestionNext improves ingestion latency from online Kafka into the offline raw data lake. That&#8217;s a big step but it&#8217;s not the full story.</p><p>Freshness still stalls downstream in transformation and analytics layers. If ingestion is minutes but transformation is still slow, the point of decision is still stale.</p><p>The next frontier for Uber is extending real-time capability end-to-end: <strong>ingestion &#8594; transformation &#8594; real-time insights and analytics</strong>. This matters because Uber&#8217;s lake powers a long list of domains: Delivery, Mobility, Machine Learning, Rider, Marketplace, Maps, Finance and Marketing Analytics. Freshness is a cross-cutting requirement.</p><div><hr></div><h4><strong>Conclusion</strong></h4><p>Uber&#8217;s shift from batch to streaming ingestion is a meaningful platform milestone. By re-architecting ingestion around Apache Flink, IngestionNext delivers fresher data, stronger reliability and scalable efficiency across a petabyte-scale lake.</p><p>The design is not just run Flink jobs. It includes operational foundations like an automated control plane, resiliency strategies and streaming-specific engineering work: faster compaction via row-group merging, skew controls and deterministic recovery by linking Flink checkpoints to Hudi commits.</p><p>The bigger idea is the mindset shift: treating freshness as a first-class dimension of data quality. With IngestionNext proven in production, the next push is clear: bring streaming into downstream transformation and analytics so the company can close the real-time loop, not just ingest data faster.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://www.uber.com/en-AU/blog/from-batch-to-streaming-accelerating-data-freshness-in-ubers-data-lake/">Uber's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-uber-cut-data-lake-freshness-from-hours-to-minutes-with-flink?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1904c23f-5462-4150-9c60-b6ad712234b6&quot;,&quot;caption&quot;:&quot;How do you keep ML teams fast when every experiment blasts your Spark cluster with spiky workloads, huge datasets and five different file formats?<br /><br />Snap&#8217;s answer: Prism, a unified Spark platform that hides infra pain, standardises pipelines and supports everything from ad-hoc exploration to 10k+ daily jobs in production.<br /><br />This post breaks down why raw Spark wasn&#8217;t enough, what Prism fixes and how Snap rebuilt their ML data layer without ditching Spark.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Snap Rebuilt Its ML Platform to Handle 10,000+ Daily Spark Jobs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-20T04:59:47.340Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:179211962,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;8b717c15-913f-4e54-91a7-fb3f26e15721&quot;,&quot;caption&quot;:&quot;How do you keep data fresh for millions of merchants when you&#8217;re streaming from 100+ MySQL shards?<br /><br />Shopify&#8217;s answer: a 400TB Change Data Capture platform that pushes up to 100k events a second.<br /><br />This post dives into the trade-offs, the challenges and the lessons learned from building CDC at scale.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Shopify Uses Change Data Capture to Serve Millions of Merchants&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-09-18T07:53:42.206Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1730818874996-dea4bddf5554?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaG9waWZ5fGVufDB8fHx8MTc1ODE4MDY0NHww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-shopify-uses-change-data-capture-to-serve-millions-of-merchants&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:173822667,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[What is Data Governance? A Practical Guide to Building Trustworthy Data in the Age of AI]]></title><description><![CDATA[From unclear ownership to missing standards, Charlotte Ledoux breaks down the simple governance practices that help organisations trust their data and ship faster.]]></description><link>https://www.datatinkerer.io/p/what-is-data-governance-a-practical-guide</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-is-data-governance-a-practical-guide</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 11 Dec 2025 04:01:53 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/71bf29d1-b0c7-4d21-9ce4-22fa2a91c4dd_760x546.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>Today I will be talking with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Charlotte Ledoux&quot;,&quot;id&quot;:30007326,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!TWOB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8b9f229-e6b2-4839-bc95-ddb75482793e_750x750.jpeg&quot;,&quot;uuid&quot;:&quot;c9425cfe-0db2-443a-8739-d6932d500d30&quot;}" data-component-name="MentionToDOM"></span> who writes the <em>The Data Governance Playbook</em> newsletter and works with companies on implementing data governance.</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:2433880,&quot;name&quot;:&quot;The Data Governance Playbook&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!74lv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6350a0-688c-4368-9643-0cb5b7adc910_383x383.png&quot;,&quot;base_url&quot;:&quot;https://thedatagovernanceplaybook.substack.com&quot;,&quot;hero_text&quot;:&quot;Your go-to resource for mastering Data Governance through practical tips, expert insights, and a touch of humour !&quot;,&quot;author_name&quot;:&quot;Charlotte Ledoux&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#ffffff&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://thedatagovernanceplaybook.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!74lv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6350a0-688c-4368-9643-0cb5b7adc910_383x383.png" width="56" height="56" style="background-color: rgb(255, 255, 255);"><span class="embedded-publication-name">The Data Governance Playbook</span><div class="embedded-publication-hero-text">Your go-to resource for mastering Data Governance through practical tips, expert insights, and a touch of humour !</div><div class="embedded-publication-author-name">By Charlotte Ledoux</div></a><form class="embedded-publication-subscribe" method="GET" action="https://thedatagovernanceplaybook.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><p>I discovered her work through the <a href="https://www.whoisthebestcdo.com">CDO game</a> (worth trying if you haven&#8217;t!). It reminded me how often data governance is misunderstood, despite becoming essential as AI takes off.</p><p>We talked about her move from analytics to governance, how the real value comes from clarity and ownership rather than tools and why the smartest governance programs start with listening long before they start with policies.</p><p>So without further ado, let&#8217;s get into it!</p><div><hr></div><h4><strong>Can you tell us about your role?</strong></h4><p>I&#8217;m a Data &amp; AI Governance expert. In practice, that means I make sure an organisation&#8217;s data is trustworthy, secure and responsibly used, especially as AI adoption accelerates. I help define the roles, responsibilities, processes and tools that state how data is collected, shared, protected and used so that teams can innovate with confidence rather than chaos.</p><div><hr></div><h4><strong>How did you break into data governance?</strong></h4><p>Before specializing in data governance, I worked more hands-on in the data ecosystem : collaborating with data teams on data science, analytics and data strategy. Over time, I realized that the biggest blockers to effective data use weren&#8217;t tools or skills but rather unclear ownership, missing standards and a lack of trust. </p><p>Governance drew me in because it sits at the intersection of strategy, quality, ethics and business value. It&#8217;s the discipline that creates the structure needed for data to actually deliver impact.</p><div class="pullquote"><p><em><strong>Charlotte&#8217;s path</strong></em></p><p><em><strong>data analytics &#8594; data strategy &#8594; data governance</strong></em></p></div><h4><strong>So what is data governance? How do you explain it simply?</strong></h4><p>Data governance is the framework that ensures data is reliable, secure and used appropriately. It defines the rules, responsibilities and processes that allow an organization to manage data (and now AI!) in a controlled and value-driven way.</p><p>A simpler version I often use: it&#8217;s about enabling people to do great things with data.</p><div class="pullquote"><p><em><strong>It defines the rules, responsibilities and processes that allow an organization to manage data in a controlled and value-driven way.</strong></em></p></div><h4><strong>What&#8217;s a common misunderstanding about data governance?</strong></h4><p>Many think governance is about control and restriction, slowing teams down with rules. In reality, good governance accelerates all projects using data. It creates clarity, improves data quality and makes it easier for people to use data and AI responsibly rather than reinventing processes or taking unnecessary risks.</p><div><hr></div><h4><strong>You&#8217;ve talked about <a href="https://thedatagovernanceplaybook.substack.com/p/the-one-playbook-every-c-level-asks">executives reacting faster to quantified waste than to better data hygiene</a>. How do you translate governance issues into numbers leaders care about?</strong></h4><p>Leaders respond when governance is tied to measurable business impact. Instead of talking about &#8220;data quality,&#8221; I translate issues into cost, risk and revenue language:<br><br>&#8226; <em><strong>Time wasted searching, cleaning or reconciling data</strong></em> &#8594; hours &#215; cost per hour<br>&#8226; <em><strong>Duplicate data or tools</strong></em> &#8594; direct spend and maintenance costs<br>&#8226; <em><strong>Regulatory exposure</strong></em> &#8594; fines or incidents avoided<br>&#8226; <em><strong>Delayed projects due to poor data</strong></em> &#8594; revenue or launch impact</p><p>This makes it easier to get senior business leaders on-board with data governance initiatives.</p><div><hr></div><h4><strong>Can you share a real-life example of where weak governance caused a costly or painful problem?</strong></h4><p>Yes, and it&#8217;s more common than we like to admit. In one organization, different teams were reporting revenue with slightly different definitions. Each department owned its own dashboard, pulling data from different sources. </p><p>Each quarter, one analyst was spending 3 weeks reconciling numbers manually to get a result for the board. Decisions were delayed because no one trusted the data.</p><div><hr></div><h4><strong>What about a governance win you&#8217;re proud of that changed a real business outcome?</strong></h4><p>A project I&#8217;m proud of involved building a simple, scalable data quality process for promotion data at a large FMCG company. Before this, teams spent a huge amount of time cleaning and reconciling data because errors were only discovered downstream : often too late, sometimes right before reporting.</p><p>We introduced <em><strong>early anomaly detection, ownership rules and validation checks at ingestion.</strong></em> Suddenly, issues were caught in days instead of weeks. Promotion data became cleaner, more complete and much easier to work with.</p><div><hr></div><h4><strong>Why do data governance programs stall?</strong></h4><p>Most governance programs stall before they properly begin because buy-in wasn&#8217;t earned early enough. Governance must be framed as something that solve business problems. </p><p>Successful programs start with listening, mapping pain points, and showing potential ROI to the Top Management and Business Sponsors.</p><div class="pullquote"><p><em><strong>Governance must be framed as something that solve business problems.</strong></em></p></div><h4><strong>How do you keep governance from becoming a blocker for data engineers and scientists who just want to ship things?</strong></h4><p>The key is to make governance help people do their job faster, not slower. Instead of policies first, create enablement, shortcuts and templates :</p><ul><li><p>Bake standards into tooling instead of PDFs nobody reads</p></li><li><p>Automate quality checks with data engineers</p></li><li><p>Co-design rules so teams feel ownership</p></li><li><p>Start with access policies that make it easier, not harder, to use data</p></li></ul><div><hr></div><h4><strong>What are some underrated governance practices that give a lot of value without needing a big tool?</strong></h4><p>3 high-impact, low-cost moves:</p><ol><li><p><em><strong>Clear data ownership.</strong></em> One accountable owner per domain or dataset reduces confusion and speeds decision-making.</p></li><li><p><em><strong>A shared vocabulary.</strong></em> Even a simple glossary at least for the top 5 metrics used or most important business terms.</p></li><li><p><em><strong>Lightweight quality standards.</strong></em> A checklist for critical datasets (freshness, documentation, lineage, SLAs) prevents firefighting later.</p></li></ol><p>None of this requires a platform : just alignment, a shared space and discipline to iterate.</p><div><hr></div><h4><strong>If you join a company as the first data governance person, what are the first three conversations you have and with whom?</strong></h4><p><em><strong>1. Business Leaders</strong></em>: to understand business goals and where data should create impact or reduce risk. Governance needs a purpose, not a template.</p><p><em><strong>2. The CDO (or whoever owns the data strategy):</strong></em><strong> </strong>to clarify priorities, risk appetite and what &#8220;good&#8221; looks like for the organization.</p><p><em><strong>3. Data consumers (analytics, finance, operations, AI teams):</strong></em> to map which decisions depend on data and what trust issues or inefficiencies slow them down today.</p><div><hr></div><h4><strong>Has the rise of AI changed how you think about governance?</strong></h4><p>Absolutely. AI moved governance from &#8220;somewhat important&#8221; to non-negotiable. With traditional data, quality issues were inconvenient, with AI, they become amplified and automated at scale. AI forces us to think beyond data assets to models, prompts, lineage of training data, human oversight and impact on decisions. </p><p>That&#8217;s why all Data Governance teams should rename themselves as Data &amp; AI Governance teams.</p><div><hr></div><h4><strong>One thing you wish you had known earlier about data governance?</strong></h4><p>That governance is 80% change management and communication and only 20% frameworks and policies. Early on, I focused heavily on standards, operating models, and processes but I learned quickly that none of it matters if people don&#8217;t understand the &#8220;why&#8221; or see what&#8217;s in it for them. </p><p>Success comes from building trust, telling a clear value story and bringing people along the journey.</p><div class="pullquote"><p><em><strong>Governance is 80% change management and communication and only 20% frameworks and policies.</strong></em></p></div><p>If you enjoyed reading this and want to learn more about Data and AI Governance by playing games, check out <a href="https://www.datagovia.com">Datagovia</a> where Charlotte is gamifying data governance</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.datagovia.com" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9Oyy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab6cbcc-0940-46a7-b963-9d433b8b5ac4_1070x738.png 424w, https://substackcdn.com/image/fetch/$s_!9Oyy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab6cbcc-0940-46a7-b963-9d433b8b5ac4_1070x738.png 848w, https://substackcdn.com/image/fetch/$s_!9Oyy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab6cbcc-0940-46a7-b963-9d433b8b5ac4_1070x738.png 1272w, https://substackcdn.com/image/fetch/$s_!9Oyy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab6cbcc-0940-46a7-b963-9d433b8b5ac4_1070x738.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9Oyy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab6cbcc-0940-46a7-b963-9d433b8b5ac4_1070x738.png" width="1070" height="738" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ab6cbcc-0940-46a7-b963-9d433b8b5ac4_1070x738.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:738,&quot;width&quot;:1070,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:881365,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.datagovia.com&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/179703922?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab6cbcc-0940-46a7-b963-9d433b8b5ac4_1070x738.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9Oyy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab6cbcc-0940-46a7-b963-9d433b8b5ac4_1070x738.png 424w, https://substackcdn.com/image/fetch/$s_!9Oyy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab6cbcc-0940-46a7-b963-9d433b8b5ac4_1070x738.png 848w, https://substackcdn.com/image/fetch/$s_!9Oyy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab6cbcc-0940-46a7-b963-9d433b8b5ac4_1070x738.png 1272w, https://substackcdn.com/image/fetch/$s_!9Oyy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab6cbcc-0940-46a7-b963-9d433b8b5ac4_1070x738.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>Was there a question that you would like to ask?</p><p><strong>Let me know your thoughts by replying to the email or leaving a comment below!</strong></p><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/from-data-analyst-to-senior-ds-manager-at-skyscanner?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE3NjU0MTk3NSwiaWF0IjoxNzY1MzQ0NTQ5LCJleHAiOjE3Njc5MzY1NDksImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.TwR07_dJBU_1yTMdfPnNbZan2iWnLVWe-dBogi_APh4&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/from-data-analyst-to-senior-ds-manager-at-skyscanner?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE3NjU0MTk3NSwiaWF0IjoxNzY1MzQ0NTQ5LCJleHAiOjE3Njc5MzY1NDksImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.TwR07_dJBU_1yTMdfPnNbZan2iWnLVWe-dBogi_APh4"><span>Share</span></a></p><div><hr></div><h3><strong>Keep reading</strong></h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;21d8566a-5219-4d73-a9b6-4137fac46ef2&quot;,&quot;caption&quot;:&quot;From Data Analyst to Senior Data Science Manager at Skyscanner<br /><br />Following on from previous posts talking to people in the field, today I will be talking with Jose Parre&#241;o Garcia who is a Senior Data Science Manager at Skyscanner and writer of the Senior Data Science Lead newsletter.<br /><br />We talked about his rise from data analyst to Senior DS Manager at Skyscanner, what &#8220;production-ready&#8221; really means and why the real intelligence in data science lives before and after the model.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;From Data Analyst to Senior DS Manager at Skyscanner&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:255728031,&quot;name&quot;:&quot;Jose Parre&#241;o Garcia&quot;,&quot;bio&quot;:&quot;Helping tech managers build efficient teams, data professionals master storytelling and guiding those looking to break into Data Science. I have built teams from scratch and lead 50+ data scientists @Skyscanner. Now, I share my experience with you.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!h_mv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4dad41-478b-4960-a5e0-98ed1e54657e_1168x1046.jpeg&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://joseparreogarcia.substack.com/subscribe?&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://joseparreogarcia.substack.com&quot;,&quot;primaryPublicationName&quot;:&quot;Senior Data Science Lead&quot;,&quot;primaryPublicationId&quot;:2833541}],&quot;post_date&quot;:&quot;2025-11-13T03:54:26.969Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06735d58-e8f2-4106-88ae-efe0658c217c_764x661.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/from-data-analyst-to-senior-ds-manager-at-skyscanner&quot;,&quot;section_name&quot;:&quot;Data Science&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:176541975,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:12,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f8302e99-65e1-4756-982f-997629702ecd&quot;,&quot;caption&quot;:&quot;Kicking off a new series where data folks share how they got here, what they&#8217;ve learned and how they actually work day to day.<br /><br />First up: Alejandro Aboy, Senior Data Engineer at Workpath and writer of The Pipe and The Line<br /><br />We talked about his path from marketing to data engineering, why his teammates call him an octopus and his take that &#8220;big data&#8221; is a myth for most teams.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;From Marketing to Data Engineering: How I Made the Switch&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:22949723,&quot;name&quot;:&quot;Alejandro Aboy&quot;,&quot;bio&quot;:&quot;Ex Web Analytics Specialist, currently working as Data Engineer. Building &amp; growing my Data &amp; AI career by sharing the tools and lessons I wish I had when I started.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!u1Ao!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca2c63d-9f5e-4cd3-99ac-7d8e71dc114b_1024x1024.jpeg&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://thepipeandtheline.substack.com/subscribe?&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://thepipeandtheline.substack.com&quot;,&quot;primaryPublicationName&quot;:&quot;The Pipe &amp; The Line&quot;,&quot;primaryPublicationId&quot;:1196229}],&quot;post_date&quot;:&quot;2025-10-16T04:01:28.208Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f252334-7437-40b5-9f82-08c981de2f6d_761x764.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/from-marketing-to-data-engineering-how-i-made-the-switch&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:175928283,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:12,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[What the Data Crowd Was Reading in November 2025]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in November 2025.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-november-2025</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-november-2025</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Wed, 03 Dec 2025 07:52:29 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c31550f6-1fdf-4738-b384-2eeb55f71662_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>It&#8217;s time for another round-up on all things data!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k7yZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k7yZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!k7yZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!k7yZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!k7yZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k7yZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/180567973?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k7yZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!k7yZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!k7yZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!k7yZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb3ffdb9-87b8-43ce-a123-5ac639dcef82_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for November.</p><div><hr></div><h3>Data science &amp; AI</h3><ul><li><p><strong><a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Effective context engineering for AI agents</a> (16 minute read)<br></strong>Anthropic&#8217;s Applied AI team explains that context engineering is now the real bottleneck in agent design - the craft of curating a small set of high-signal tokens so agents stay focused, retrieve what they need just-in-time, and remain coherent across long, complex tasks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Af5I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f278b6-2003-49f6-aa48-4f390ca7afa1_2292x1290.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Af5I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f278b6-2003-49f6-aa48-4f390ca7afa1_2292x1290.webp 424w, https://substackcdn.com/image/fetch/$s_!Af5I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f278b6-2003-49f6-aa48-4f390ca7afa1_2292x1290.webp 848w, https://substackcdn.com/image/fetch/$s_!Af5I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f278b6-2003-49f6-aa48-4f390ca7afa1_2292x1290.webp 1272w, https://substackcdn.com/image/fetch/$s_!Af5I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f278b6-2003-49f6-aa48-4f390ca7afa1_2292x1290.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Af5I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f278b6-2003-49f6-aa48-4f390ca7afa1_2292x1290.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9f278b6-2003-49f6-aa48-4f390ca7afa1_2292x1290.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:68676,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/180567973?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f278b6-2003-49f6-aa48-4f390ca7afa1_2292x1290.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Af5I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f278b6-2003-49f6-aa48-4f390ca7afa1_2292x1290.webp 424w, https://substackcdn.com/image/fetch/$s_!Af5I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f278b6-2003-49f6-aa48-4f390ca7afa1_2292x1290.webp 848w, https://substackcdn.com/image/fetch/$s_!Af5I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f278b6-2003-49f6-aa48-4f390ca7afa1_2292x1290.webp 1272w, https://substackcdn.com/image/fetch/$s_!Af5I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f278b6-2003-49f6-aa48-4f390ca7afa1_2292x1290.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://hackernoon.com/simple-battle-tested-algorithms-still-outperform-ai?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Simple, Battle-Tested Algorithms Still Outperform AI</a> (11 minute read)<br></strong>Jose Crespo argues that companies are wasting hundreds of billions on overhyped AI that delivers negative ROI, while a handful of simple, century-old algorithms plus competent programmers crush it on real business problems.</p></li><li><p><strong><a href="https://medium.com/data-from-the-trenches/how-can-you-identify-an-agentic-ai-use-case-b95b3fa45600?utm_source=datatinkerer.io&amp;utm_medium=newsletter">How Can You Identify an Agentic AI Use Case?</a> (13 minute read)<br></strong>Pierre Petrella explains that an AI agent is an LLM with tools, a clear task and reasoning and gives a practical framework to spot high-value agent use cases in your org by mapping inputs, outputs, tools, scope and playbooks.</p></li><li><p><strong><a href="https://www.philschmid.de/gemini-3-prompt-practices?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Gemini 3 Prompting: Best Practices for General Usage</a></strong> <strong>(6 minute read)<br></strong>Philipp Schmid from Google Deepmind lays out practical prompting patterns for Gemini 3: be direct, highly structured (XML/Markdown), explicitly plan and self-check and use agent-style tool and domain-specific workflows to get more reliable and high-quality outputs.</p></li><li><p><strong><a href="https://magazine.sebastianraschka.com/p/beyond-standard-llms?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Beyond Standard LLMs</a> (36 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Sebastian Raschka, PhD&quot;,&quot;id&quot;:27393275,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F61f4c017-506f-4e9b-a24f-76340dad0309_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;c17c1f39-f912-43c1-a223-029498f3e844&quot;}" data-component-name="MentionToDOM"></span> reviews four non-standard LLM directions - linear-attention hybrids, text diffusion, code world models and tiny recursive transformers - and concludes they&#8217;re promising niche upgrades but still complements, not replacements for classic autoregressive transformers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iKIw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfeb7edd-8553-44ef-9509-5702e867c287_1137x574.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iKIw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfeb7edd-8553-44ef-9509-5702e867c287_1137x574.webp 424w, https://substackcdn.com/image/fetch/$s_!iKIw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfeb7edd-8553-44ef-9509-5702e867c287_1137x574.webp 848w, https://substackcdn.com/image/fetch/$s_!iKIw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfeb7edd-8553-44ef-9509-5702e867c287_1137x574.webp 1272w, https://substackcdn.com/image/fetch/$s_!iKIw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfeb7edd-8553-44ef-9509-5702e867c287_1137x574.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iKIw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfeb7edd-8553-44ef-9509-5702e867c287_1137x574.webp" width="1137" height="574" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bfeb7edd-8553-44ef-9509-5702e867c287_1137x574.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:574,&quot;width&quot;:1137,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53336,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/180567973?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfeb7edd-8553-44ef-9509-5702e867c287_1137x574.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iKIw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfeb7edd-8553-44ef-9509-5702e867c287_1137x574.webp 424w, https://substackcdn.com/image/fetch/$s_!iKIw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfeb7edd-8553-44ef-9509-5702e867c287_1137x574.webp 848w, https://substackcdn.com/image/fetch/$s_!iKIw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfeb7edd-8553-44ef-9509-5702e867c287_1137x574.webp 1272w, https://substackcdn.com/image/fetch/$s_!iKIw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfeb7edd-8553-44ef-9509-5702e867c287_1137x574.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://www.dwarkesh.com/p/bits-per-sample?utm_source=datatinkerer.io&amp;utm_medium=newsletter">RL is even more information inefficient than you thought</a> (11 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Dwarkesh Patel&quot;,&quot;id&quot;:4281466,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb715ffd1-f7d7-4755-af88-c48efe647f5b_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;97513909-1dc9-432f-af80-b0f229ce88d8&quot;}" data-component-name="MentionToDOM"></span> argues that RL learns far fewer bits per sample than supervised learning so it only becomes efficient once models are already strong</p></li><li><p><strong><a href="https://huggingface.co/blog/codelion/optimal-dataset-mixing?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix</a> (10 minute read)<br></strong>Asankhaya Sharma finds that a simple 50-30-20 data mix (finePDFs, DCLM-baseline, FineWeb-Edu) lets a 1B-token GPT-2 run reach &gt;90% of GPT-2 performance using a tenth of the data, outperforming all curriculum strategies.</p></li><li><p><strong><a href="https://alechelbling.com/blog/isomap/?utm_source=datatinkerer.io&amp;utm_medium=newsletter">A Visual Introduction to Dimensionality Reduction with Isomap</a> (9 minute read)<br></strong>Alec Helbling provides a good introduction to dimensionality reduction with visuals (and math)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7qYm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bbc7707-32ed-47d6-bd8b-c863fe8b83e3_582x508.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7qYm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bbc7707-32ed-47d6-bd8b-c863fe8b83e3_582x508.gif 424w, https://substackcdn.com/image/fetch/$s_!7qYm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bbc7707-32ed-47d6-bd8b-c863fe8b83e3_582x508.gif 848w, https://substackcdn.com/image/fetch/$s_!7qYm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bbc7707-32ed-47d6-bd8b-c863fe8b83e3_582x508.gif 1272w, https://substackcdn.com/image/fetch/$s_!7qYm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bbc7707-32ed-47d6-bd8b-c863fe8b83e3_582x508.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7qYm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bbc7707-32ed-47d6-bd8b-c863fe8b83e3_582x508.gif" width="582" height="508" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4bbc7707-32ed-47d6-bd8b-c863fe8b83e3_582x508.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:508,&quot;width&quot;:582,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:626427,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/180567973?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bbc7707-32ed-47d6-bd8b-c863fe8b83e3_582x508.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7qYm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bbc7707-32ed-47d6-bd8b-c863fe8b83e3_582x508.gif 424w, https://substackcdn.com/image/fetch/$s_!7qYm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bbc7707-32ed-47d6-bd8b-c863fe8b83e3_582x508.gif 848w, https://substackcdn.com/image/fetch/$s_!7qYm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bbc7707-32ed-47d6-bd8b-c863fe8b83e3_582x508.gif 1272w, https://substackcdn.com/image/fetch/$s_!7qYm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bbc7707-32ed-47d6-bd8b-c863fe8b83e3_582x508.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://read.futureproofds.com/p/semantic-layers-and-the-future-of-agentic-analytics?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Semantic Layers and the Future of Agentic Analytics</a> (9 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Andres Vourakis&quot;,&quot;id&quot;:135808578,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ebf6fc-4ed6-47e1-938e-a1fa37a2347e_1601x1646.jpeg&quot;,&quot;uuid&quot;:&quot;168cd081-fd91-49d2-bd19-2df20ca56021&quot;}" data-component-name="MentionToDOM"></span> says Agentic Analytics is the next big shift, powered by semantic layers and OSI standards that let AI agents finally understand business context instead of guessing.</p></li><li><p><strong><a href="https://www.datatinkerer.io/p/from-data-analyst-to-senior-ds-manager-at-skyscanner">From Data Analyst to Senior DS Manager at Skyscanner</a> (16 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Jose Parre&#241;o Garcia&quot;,&quot;id&quot;:255728031,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!h_mv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4dad41-478b-4960-a5e0-98ed1e54657e_1168x1046.jpeg&quot;,&quot;uuid&quot;:&quot;a0f6c0cb-2c71-487a-a919-2e5b53358690&quot;}" data-component-name="MentionToDOM"></span> talks about his rise from data analyst to Senior DS Manager at Skyscanner, what &#8220;production-ready&#8221; really means and why the real intelligence in data science lives before and after the model.</p></li></ul><div><hr></div><h3>Data engineering</h3><ul><li><p><strong><a href="https://www.pracdata.io/p/is-ducklake-a-step-backward?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Is DuckLake a Step Backward?</a> (17 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Alireza Sadeghi&quot;,&quot;id&quot;:3524999,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2f2c1bd7-7ad0-45b3-a325-7e369db6965b_576x576.jpeg&quot;,&quot;uuid&quot;:&quot;56ab18b5-951a-4924-aefd-e969edfc9e2a&quot;}" data-component-name="MentionToDOM"></span> argues that DuckLake revives relational metadata for lakehouses to simplify operations and improve transactional safety but may struggle to scale beyond mid-sized workloads unless the ecosystem and tooling mature.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Yxqn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F603bbcdc-a6fc-4994-b200-3c2cee24d64e_1456x983.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Yxqn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F603bbcdc-a6fc-4994-b200-3c2cee24d64e_1456x983.webp 424w, https://substackcdn.com/image/fetch/$s_!Yxqn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F603bbcdc-a6fc-4994-b200-3c2cee24d64e_1456x983.webp 848w, https://substackcdn.com/image/fetch/$s_!Yxqn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F603bbcdc-a6fc-4994-b200-3c2cee24d64e_1456x983.webp 1272w, https://substackcdn.com/image/fetch/$s_!Yxqn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F603bbcdc-a6fc-4994-b200-3c2cee24d64e_1456x983.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Yxqn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F603bbcdc-a6fc-4994-b200-3c2cee24d64e_1456x983.webp" width="1456" height="983" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/603bbcdc-a6fc-4994-b200-3c2cee24d64e_1456x983.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:983,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:76168,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/180567973?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F603bbcdc-a6fc-4994-b200-3c2cee24d64e_1456x983.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Yxqn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F603bbcdc-a6fc-4994-b200-3c2cee24d64e_1456x983.webp 424w, https://substackcdn.com/image/fetch/$s_!Yxqn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F603bbcdc-a6fc-4994-b200-3c2cee24d64e_1456x983.webp 848w, https://substackcdn.com/image/fetch/$s_!Yxqn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F603bbcdc-a6fc-4994-b200-3c2cee24d64e_1456x983.webp 1272w, https://substackcdn.com/image/fetch/$s_!Yxqn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F603bbcdc-a6fc-4994-b200-3c2cee24d64e_1456x983.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://bigdata.2minutestreaming.com/p/event-streaming-is-topping-out?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Event Streaming is Topping Out</a> (16 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Stanislav Kozlovski&quot;,&quot;id&quot;:1057029,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!0lM5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03bc8810-0db9-40c9-a28b-1a4752b7a135_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;1e0de64f-99d4-4650-8c78-2b16a435c0b5&quot;}" data-component-name="MentionToDOM"></span> argues that the real-time streaming market is saturated and shrinking, with Kafka vendors facing price wars, weak growth and looming consolidation even as Kafka itself remains entrenched.</p></li></ul><ul><li><p><strong><a href="https://javisantana.com/2024/11/30/learnings-after-4-years-data-eng.html?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Learnings after 4 years working with +50 companies on data engineering projects</a> (5 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Javi Santana&quot;,&quot;id&quot;:3218399,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/a05a9a09-f78b-4b73-bab8-c8f892da61de_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;3e3617b5-8896-4b0d-8aaa-997d21348c86&quot;}" data-component-name="MentionToDOM"></span>&#8217;s takeaway after working with 50+ companies on data: forget the hype, fix your ingestion and schemas, drop unused data and keep your stack simple if you want real speed and savings.</p></li><li><p><strong><a href="https://www.dataengineeringweekly.com/p/the-dark-data-tax-how-hoarding-is?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The Dark Data Tax: How Hoarding is Poisoning Your AI</a> (10 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Ananth Packkildurai&quot;,&quot;id&quot;:3520227,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!mRE-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4f38fa68-8a30-4357-a48e-6833efe28c0f_989x989.jpeg&quot;,&quot;uuid&quot;:&quot;4161755f-d9e6-4a55-bc08-5ad17569d560&quot;}" data-component-name="MentionToDOM"></span> warns that hoarding data has become a metabolic failure where 90% unused tables bloat costs, degrade AI signal and choke analytics.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Y_E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb7f415-2739-4a42-b5ef-cf103b850a11_1456x610.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Y_E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb7f415-2739-4a42-b5ef-cf103b850a11_1456x610.webp 424w, https://substackcdn.com/image/fetch/$s_!5Y_E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb7f415-2739-4a42-b5ef-cf103b850a11_1456x610.webp 848w, https://substackcdn.com/image/fetch/$s_!5Y_E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb7f415-2739-4a42-b5ef-cf103b850a11_1456x610.webp 1272w, https://substackcdn.com/image/fetch/$s_!5Y_E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb7f415-2739-4a42-b5ef-cf103b850a11_1456x610.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Y_E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb7f415-2739-4a42-b5ef-cf103b850a11_1456x610.webp" width="1456" height="610" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3eb7f415-2739-4a42-b5ef-cf103b850a11_1456x610.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:610,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36482,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/180567973?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb7f415-2739-4a42-b5ef-cf103b850a11_1456x610.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Y_E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb7f415-2739-4a42-b5ef-cf103b850a11_1456x610.webp 424w, https://substackcdn.com/image/fetch/$s_!5Y_E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb7f415-2739-4a42-b5ef-cf103b850a11_1456x610.webp 848w, https://substackcdn.com/image/fetch/$s_!5Y_E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb7f415-2739-4a42-b5ef-cf103b850a11_1456x610.webp 1272w, https://substackcdn.com/image/fetch/$s_!5Y_E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb7f415-2739-4a42-b5ef-cf103b850a11_1456x610.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong><a href="https://seattledataguy.substack.com/p/what-it-really-takes-to-move-from?utm_source=datatinkerer.io&amp;utm_medium=newsletter">What It Really Takes to Move From Senior to Staff Data Engineer</a> (10 minute read)<br></strong>A good discussion by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;SeattleDataGuy&quot;,&quot;id&quot;:4963622,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec905aa-9a7b-4f21-b0ff-fec92e8916d1_512x512.jpeg&quot;,&quot;uuid&quot;:&quot;14565b76-e869-48fb-ac0f-85edc6801474&quot;}" data-component-name="MentionToDOM"></span> and a Staff Data Engineer at Apple about what it takes to crack the move to more senior positions.</p></li><li><p><strong><a href="https://pipeline2insights.substack.com/p/apache-spark-fundamentals-for-data-engineers?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Apache Spark Fundamentals for Data Engineers</a> (11 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Erfan Hesami&quot;,&quot;id&quot;:277538242,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!rcW2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2692f-48e0-43a5-9f33-7eebb007bd6e_1641x1641.jpeg&quot;,&quot;uuid&quot;:&quot;8ef80328-d089-467e-9f57-88325406bb27&quot;}" data-component-name="MentionToDOM"></span> explains how Apache Spark works from the ground up, using a simple CSV example to walk through its history, architecture, ecosystem and step-by-step execution.</p></li></ul><ul><li><p><strong><a href="https://medium.com/fresha-data-engineering/iceberg-cdc-stream-a-little-dream-of-me-a7c9f9e6e11d">Iceberg CDC: Stream a Little Dream of Me</a> (15 minute read)</strong></p><p>Anton Borisov breaks down how Iceberg&#8217;s v3 identity tracking and v4 compact metadata turn messy equality-delete upserts into predictable, low-overhead CDC, especially when paired with a catalog that hands out clean global deltas.</p></li><li><p><strong><a href="https://luminousmen.substack.com/p/data-warehouse-data-lake-data-lakehouse?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Data Warehouse, Data Lake, Data Lakehouse, Data Mesh: What They Are and How They Differ</a> (14 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;luminousmen&quot;,&quot;id&quot;:29227863,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffead33a9-5e35-4522-b96e-c1a523419524_300x297.jpeg&quot;,&quot;uuid&quot;:&quot;1c99dc40-d3fc-48da-87e9-300eba2e16ca&quot;}" data-component-name="MentionToDOM"></span> breaks down warehouse, lake, lakehouse and mesh and talks about picking the &#8216;right&#8217; one depending on your biggest bottleneck</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!99Gg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F781f69e6-5913-48b0-a43e-389b3c1e6d80_1456x1013.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!99Gg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F781f69e6-5913-48b0-a43e-389b3c1e6d80_1456x1013.webp 424w, https://substackcdn.com/image/fetch/$s_!99Gg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F781f69e6-5913-48b0-a43e-389b3c1e6d80_1456x1013.webp 848w, https://substackcdn.com/image/fetch/$s_!99Gg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F781f69e6-5913-48b0-a43e-389b3c1e6d80_1456x1013.webp 1272w, https://substackcdn.com/image/fetch/$s_!99Gg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F781f69e6-5913-48b0-a43e-389b3c1e6d80_1456x1013.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!99Gg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F781f69e6-5913-48b0-a43e-389b3c1e6d80_1456x1013.webp" width="1456" height="1013" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/781f69e6-5913-48b0-a43e-389b3c1e6d80_1456x1013.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1013,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:55310,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/180567973?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F781f69e6-5913-48b0-a43e-389b3c1e6d80_1456x1013.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!99Gg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F781f69e6-5913-48b0-a43e-389b3c1e6d80_1456x1013.webp 424w, https://substackcdn.com/image/fetch/$s_!99Gg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F781f69e6-5913-48b0-a43e-389b3c1e6d80_1456x1013.webp 848w, https://substackcdn.com/image/fetch/$s_!99Gg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F781f69e6-5913-48b0-a43e-389b3c1e6d80_1456x1013.webp 1272w, https://substackcdn.com/image/fetch/$s_!99Gg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F781f69e6-5913-48b0-a43e-389b3c1e6d80_1456x1013.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong><a href="https://dataengineeringcentral.substack.com/p/650gb-of-data-delta-lake-on-s3-polars?utm_source=datatinkerer.io&amp;utm_medium=newsletter">650GB of Data (Delta Lake on S3). Polars vs DuckDB vs Daft vs Spark</a> (9 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Daniel Beach&quot;,&quot;id&quot;:21715962,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F81caaeec-9053-487c-a59c-ba5f8e4644ad_256x256.jpeg&quot;,&quot;uuid&quot;:&quot;78ea35d1-6357-4443-a2ed-49c13101de36&quot;}" data-component-name="MentionToDOM"></span> shows that DuckDB, Polars and Daft can chew through a 650GB Delta Lake on a single 32GB node, proving most teams don&#8217;t need clusters nearly as often as they think.</p></li></ul><div><hr></div><h3>Data analysis and visualisation</h3><ul><li><p><strong><a href="https://nastengraph.substack.com/p/the-complete-guide-to-dashboard-testing?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The Complete Guide to Dashboard Testing: Ensuring Quality and Reliability</a> (7 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Anastasiya Kuznetsova&quot;,&quot;id&quot;:99725349,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!2E6h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb9d9c-d4e0-4f30-bc37-73eb9ffe4d53_516x534.png&quot;,&quot;uuid&quot;:&quot;1dc26c29-5e4c-41f9-a929-964070131975&quot;}" data-component-name="MentionToDOM"></span> explains how to bulletproof dashboards end-to-end by validating design, interaction, data pipelines, filters and load before they ever reach stakeholders.</p></li></ul><ul><li><p><strong><a href="https://www.datatinkerer.io/p/the-data-analysts-dilemma-accuracy-vs-speed">The Data Analyst&#8217;s Dilemma: Accuracy vs Speed</a> (7 minute read)<br></strong>The analyst&#8217;s dilemma: knowing when to aim for perfect accuracy and when &#8220;good enough&#8221; is all the business really needs.</p></li></ul><div><hr></div><h3><strong>Other interesting reads</strong></h3><ul><li><p><strong><a href="https://joereis.substack.com/p/eroding-the-edges-ai-generated-build?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Eroding the Edges: (AI-Generated) Build vs. Buy and the Future of Software</a> (12 minute read)<br></strong>Interesting take by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Joe Reis&quot;,&quot;id&quot;:3531217,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e4716b1-c223-41e3-b943-def0291bf217_1175x783.jpeg&quot;,&quot;uuid&quot;:&quot;db2bd194-1cc2-49e6-ae81-56dcef484238&quot;}" data-component-name="MentionToDOM"></span> about &#8220;good enough&#8221; AI-coded apps blowing up the old buy vs. build calculus and pushing engineers to act more like orchestrators rather than builders.</p></li><li><p><strong><a href="https://www.artificialintelligencemadesimple.com/p/the-low-tech-revolution-why-ai-willhttps://www.artificialintelligencemadesimple.com/p/the-low-tech-revolution-why-ai-will?utm_source=datatinkerer.io&amp;utm_medium=newsletter">The Low-Tech Revolution: Why AI Will Transform the Industries Tech Forgot</a> (32 minute read)<br></strong><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Devansh&quot;,&quot;id&quot;:8101724,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48081c70-8afa-41e3-a44e-b0f917bc7577_1200x1600.jpeg&quot;,&quot;uuid&quot;:&quot;24e69315-a19e-45ac-ba9d-5f2c845b468d&quot;}" data-component-name="MentionToDOM"></span> makes the case that cheap models, edge hardware and agentic software flip the math for thin-margin sectors, making AI adoption inevitable even as platform consolidation threatens to capture most of the value.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sRRE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f739b13-507c-4a13-b551-3ad903761fa0_1456x1404.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sRRE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f739b13-507c-4a13-b551-3ad903761fa0_1456x1404.webp 424w, https://substackcdn.com/image/fetch/$s_!sRRE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f739b13-507c-4a13-b551-3ad903761fa0_1456x1404.webp 848w, https://substackcdn.com/image/fetch/$s_!sRRE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f739b13-507c-4a13-b551-3ad903761fa0_1456x1404.webp 1272w, https://substackcdn.com/image/fetch/$s_!sRRE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f739b13-507c-4a13-b551-3ad903761fa0_1456x1404.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sRRE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f739b13-507c-4a13-b551-3ad903761fa0_1456x1404.webp" width="1456" height="1404" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f739b13-507c-4a13-b551-3ad903761fa0_1456x1404.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1404,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:89462,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/180567973?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f739b13-507c-4a13-b551-3ad903761fa0_1456x1404.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sRRE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f739b13-507c-4a13-b551-3ad903761fa0_1456x1404.webp 424w, https://substackcdn.com/image/fetch/$s_!sRRE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f739b13-507c-4a13-b551-3ad903761fa0_1456x1404.webp 848w, https://substackcdn.com/image/fetch/$s_!sRRE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f739b13-507c-4a13-b551-3ad903761fa0_1456x1404.webp 1272w, https://substackcdn.com/image/fetch/$s_!sRRE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f739b13-507c-4a13-b551-3ad903761fa0_1456x1404.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href="https://epochai.substack.com/p/benchmark-scores-general-capability?utm_source=datatinkerer.io&amp;utm_medium=newsletter">Benchmark Scores = General Capability + Claudiness</a> (8 minute read)<br></strong>Greg Burnham shows that most benchmark scores collapse into one &#8216;general capability&#8217; factor with a smaller &#8216;Claudiness&#8217; axis on top and argues this hints we&#8217;re in a world where broad skills partly generalize but still require targeted, paid-for improvements on specific abilities.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DmiF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbccb08b1-1013-478d-9117-2ac1fd107f1b_1023x1279.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DmiF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbccb08b1-1013-478d-9117-2ac1fd107f1b_1023x1279.webp 424w, https://substackcdn.com/image/fetch/$s_!DmiF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbccb08b1-1013-478d-9117-2ac1fd107f1b_1023x1279.webp 848w, https://substackcdn.com/image/fetch/$s_!DmiF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbccb08b1-1013-478d-9117-2ac1fd107f1b_1023x1279.webp 1272w, https://substackcdn.com/image/fetch/$s_!DmiF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbccb08b1-1013-478d-9117-2ac1fd107f1b_1023x1279.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DmiF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbccb08b1-1013-478d-9117-2ac1fd107f1b_1023x1279.webp" width="1023" height="1279" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bccb08b1-1013-478d-9117-2ac1fd107f1b_1023x1279.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1279,&quot;width&quot;:1023,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:41376,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/180567973?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbccb08b1-1013-478d-9117-2ac1fd107f1b_1023x1279.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DmiF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbccb08b1-1013-478d-9117-2ac1fd107f1b_1023x1279.webp 424w, https://substackcdn.com/image/fetch/$s_!DmiF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbccb08b1-1013-478d-9117-2ac1fd107f1b_1023x1279.webp 848w, https://substackcdn.com/image/fetch/$s_!DmiF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbccb08b1-1013-478d-9117-2ac1fd107f1b_1023x1279.webp 1272w, https://substackcdn.com/image/fetch/$s_!DmiF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbccb08b1-1013-478d-9117-2ac1fd107f1b_1023x1279.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li></ul><div><hr></div><h3><strong>Quick favor - need your take</strong></h3><div class="poll-embed" data-attrs="{&quot;id&quot;:414053}" data-component-name="PollToDOM"></div><p><strong>Was there any standout article or topic from November I missed? Feel free to drop a comment or hit reply, even a quick line helps.</strong></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/p/how-expedia-monitors-1000-ab-tests-in-real-time-with-flink-and-kafka?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjoyOTE1OTA0NDIsInBvc3RfaWQiOjE2OTA5NDI3MywiaWF0IjoxNzU0NTE5MDY3LCJleHAiOjE3NTcxMTEwNjcsImlzcyI6InB1Yi0zNDIyNzQwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.oZvHOJmFWdVqE7IbG0eqLLsohZgpmGBltKU1W08ZN4c"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;27385512-431a-4d66-acc8-78b85b942c01&quot;,&quot;caption&quot;:&quot;It's time for another data/AI roundup and here are the highlights from October&#128071;<br /><br />Data Science &amp;amp; AI<br />How Gradient Descent Works<br />Recursive Language Models<br />The Continual Learning Problem<br />Why Analytics Agents Break Differently<br /><br />Data Engineering<br />How Kafka Works<br />Data Modeling for the Agentic Era<br />You&#8217;ll Never Have a FAANG Data Infrastructure<br />Getting Started with OpenMetadata<br /><br />Data Analysis &amp;amp; BI<br />Jobs-to-be-Done: Designing dashboards for what users need to achieve.<br />From Dental Cleaning to Data Cleaning: Pivoting into healthcare analytics.<br /><br />Plus: Real AI Agents and Real Work, Taking the Bitter Lesson Seriously: Let AI optimize compute, not humans, OpenAI Is a Consumer Company, Import AI 431: Technological optimism meets appropriate fear&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in October 2025&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-06T07:22:24.105Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a00481e6-bc3b-4419-9304-ed408b193853_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:178132882,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:12,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;cf4616de-7cd4-4ef7-8350-1d9b508a1b4e&quot;,&quot;caption&quot;:&quot;here are the highlights from September&#128071;<br /><br />Data Science &amp;amp; AI<br />Meta&#8217;s framework for data scientists as product leaders<br />23 RAG pitfalls (and fixes), post-training guide for LLMs<br />Kaggle Grandmasters&#8217; 7 tricks for tabular data<br />Anthropic&#8217;s tool-building with agents<br /><br />Data Engineering<br />Medallion Architecture with a Platinum layer<br />2025 data engineering trends<br />Apache Fluss for real-time changelogs<br />A blunt MotherDuck review<br /><br />Data Analysis &amp;amp; BI<br />Building AI data analysts with semantic layers and multi-agents<br />The mindset shift that separates good BI devs from great ones<br />The analyst&#8217;s dilemma of accuracy vs speed.<br /><br />Plus: China&#8217;s open-weight AI playbook, new SoTA on ARC-AGI with English over Python and how robot fleets, simulation and human video could fuel a &#8220;Robot GPT.&#8221;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What the Data Crowd Was Reading in September 2025&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-10-02T07:19:13.369Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/45dce40b-ba8c-432c-88c2-c2e8ab6c888f_500x500.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-september-2025&quot;,&quot;section_name&quot;:&quot;Data Roundup&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:174976801,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:2,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Snap Rebuilt Its ML Platform to Handle 10,000+ Daily Spark Jobs]]></title><description><![CDATA[Inside Prism, the system that turned scattered Spark workflows into a unified, ML-ready platform.]]></description><link>https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 20 Nov 2025 04:59:47 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Snap unified Spark, ML workflows and 10k+ daily jobs under one platform.</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TBRd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TBRd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TBRd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/179211962?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TBRd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!TBRd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F177f23c2-03d2-4ec5-bc67-33e5bb8b23fd_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Snap&#8217;s ML platform transformation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="4000" height="6000" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:6000,&quot;width&quot;:4000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;text&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="text" title="text" srcset="https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1620396749162-d61bdb715793?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzbmFwY2hhdHxlbnwwfHx8fDE3NjM1Mjk0MDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@maygauthier">May Gauthier</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h3>TL;DR</h3><div><hr></div><h4><strong>Situation</strong></h4><p>Snap&#8217;s ML teams relied on Apache Spark for analytics but raw Spark was painful for ML workflows: spiky, iterative training data jobs, multiple data formats, scattered tooling and heavy cluster babysitting for every experiment.</p><h4><strong>Task</strong></h4><p>They needed an ML-focused data platform on top of Spark that hid infra, handled diverse formats, supported fast experimentation through to stable production and gave a single, coherent experience for Spark users.</p><h4><strong>Action</strong></h4><p>They built Prism, a unified Spark platform with a UI and SDK, config-driven Prism Templates to define jobs in YAML, a control plane with Temporal workflows for cluster lifecycle, centralised metrics, autotuning and deep integration with Snap&#8217;s billing, orchestration and lakehouse tools.</p><h4><strong>Result</strong></h4><p>Prism grew from a handful of daily jobs to several thousand per day, with peaks over 10k, cut onboarding friction, standardised patterns, improved reliability and let ML engineers focus on experiments and models instead of Spark internals and cluster management.</p><h4><strong>Use Cases</strong></h4><p>Feature engineering, batch model pipelines, lakehouse ingestion, experiment workflows</p><h4><strong>Tech Stack/Framework</strong></h4><p>Apache Spark, Apache Iceberg, Dataproc, Apache Airflow, Kubeflow, Apache Parquet, Trino</p><div><hr></div><h3>Explained further</h3><div><hr></div><h4>Context</h4><p>Apache Spark has been a core part of Snap&#8217;s analytical stack for a long time. It runs the pipelines that feed reports, dashboards and batch data products. For that world, Spark is a good fit.</p><p>Machine learning puts new pressure on that foundation.</p><p><strong>ML development is inherently iterative</strong>. An engineer can spend a week trying variations of the same broad idea: a different label definition, a refined feature set, a new way to slice users, a new pre-processing recipe. Each iteration often means regenerating training data from very large raw sources. Doing that repeatedly is not a nice, predictable nightly job. It is a series of intense, sometimes spiky workloads that hit the platform whenever someone has another idea.</p><p><strong>The development lifecycle is also more fluid.</strong> Early in a project, ML engineers want freedom. They want to pull ad hoc samples, tweak schemas on the fly, and see results quickly. Once the same model is ready for production, the expectations flip. Pipelines need to be stable, observable and efficient on real traffic. The platform has to support both modes without asking people to throw away all their early work and start again from scratch.</p><p><strong>Then there is the question of formats.</strong> ML workloads do not live in a single file format. Snap&#8217;s teams use:</p><ul><li><p>TFRecord when they are feeding TensorFlow</p></li><li><p>Protobuf when they are working with gRPC-based serving systems</p></li><li><p>JSON for lightweight exploration and simple tests</p></li><li><p>Parquet and Iceberg for analytical and lakehouse-style storage </p></li></ul><p>Forcing everything through a single &#8220;blessed&#8221; format would only slow teams down. A realistic platform needs to work comfortably across all of these.</p><div><hr></div><h4><strong>Where raw Spark started to hurt ML teams</strong></h4><p>Spark itself is not the weak link. It is powerful, battle tested and extremely good at scaling SQL and batch workloads. The problem is not capability; it is usability for ML engineers.</p><p>Without the right abstractions:</p><ul><li><p>Engineers need to understand Spark internals and distributed systems just to write reasonable jobs.</p></li><li><p>They rebuild the same boilerplate, like data validation or common preprocessing across teams.</p></li><li><p>They manage clusters, dependencies, and upgrades themselves.</p></li><li><p>They spend time in Spark UI and logs chasing down failures that add no value to the model itself.</p></li></ul><p>All of this is on top of their actual core stack: TensorFlow or PyTorch, notebooks, experiment tracking tools, workflow systems like Kubeflow and internal ML platforms. Spark is an ingredient they need for scale, not the centre of their role. When the support around it is thin, that ingredient becomes a constant source of overhead.</p><p>Snap&#8217;s ML engineers wanted to spend their time on experiments and models, not on reverse engineering cluster failures. That is the capability gap the team set out to close.</p><div><hr></div><h4><strong>What Snap wanted from an ML data platform</strong></h4><p>The Snap team set a clear goal: build an ML-focused data platform on top of Spark that feels consistent, friendly, and scalable, instead of &#8220;bare-metal Spark with some helpers&#8221;.</p><p>The platform should let ML engineers:</p><ul><li><p>Describe what data they need instead of hand-coding how to compute it</p></li><li><p>Iterate quickly without piles of glue code</p></li><li><p>Reuse proven, secure patterns for data processing</p></li><li><p>Spend their energy on model and product logic instead of infrastructure</p></li></ul><p>That means heavy lifting in one place: infrastructure abstraction, patterns, observability, integration with Snap&#8217;s internal ecosystem for metrics, billing, cost tracking, scheduling.</p><p>With that foundation in place, ML development becomes faster and more consistent, and the platform team can invest in shared improvements instead of firefighting one-off jobs.</p><div><hr></div><h4><strong>Boiling the problem down</strong></h4><p>So the problem is not &#8220;Spark is bad for ML&#8221;. The problem is that raw Spark is too low level for the way ML teams actually work.</p><p>What Snap&#8217;s team built with Prism is a layer on top of Spark that:</p><ul><li><p>Hides cluster-level pain</p></li><li><p>Standardizes job patterns</p></li><li><p>Bakes in observability and cost awareness</p></li><li><p>Fits naturally into ML workflows rather than generic analytics</p></li></ul><p>Prism is the answer built around those constraints. It keeps Spark, but wraps it in a set of tools that match how ML teams at Snap actually work.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h-7U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca80c17-a2dd-4633-8155-6a21f96599cd_2588x1153.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h-7U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca80c17-a2dd-4633-8155-6a21f96599cd_2588x1153.avif 424w, https://substackcdn.com/image/fetch/$s_!h-7U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca80c17-a2dd-4633-8155-6a21f96599cd_2588x1153.avif 848w, https://substackcdn.com/image/fetch/$s_!h-7U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca80c17-a2dd-4633-8155-6a21f96599cd_2588x1153.avif 1272w, https://substackcdn.com/image/fetch/$s_!h-7U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca80c17-a2dd-4633-8155-6a21f96599cd_2588x1153.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h-7U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca80c17-a2dd-4633-8155-6a21f96599cd_2588x1153.avif" width="1456" height="649" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aca80c17-a2dd-4633-8155-6a21f96599cd_2588x1153.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:649,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90288,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/179211962?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca80c17-a2dd-4633-8155-6a21f96599cd_2588x1153.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h-7U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca80c17-a2dd-4633-8155-6a21f96599cd_2588x1153.avif 424w, https://substackcdn.com/image/fetch/$s_!h-7U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca80c17-a2dd-4633-8155-6a21f96599cd_2588x1153.avif 848w, https://substackcdn.com/image/fetch/$s_!h-7U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca80c17-a2dd-4633-8155-6a21f96599cd_2588x1153.avif 1272w, https://substackcdn.com/image/fetch/$s_!h-7U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca80c17-a2dd-4633-8155-6a21f96599cd_2588x1153.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The new solution requirements (source: Snap)</figcaption></figure></div><div><hr></div><p><strong>Meet Prism: Snap&#8217;s ML data platform on Spark</strong></p><p>Prism is Snap&#8217;s unified Spark platform. From the outside, it looks like one coherent system that handles job authoring, productionisation and post-production operations.</p><p>Instead of each team creating its own way of submitting Spark jobs and wiring up clusters, Prism offers a consistent experience. Engineers can define jobs in a configuration-driven way, work in a UI when they want a visual surface, or go through an SDK when they need more control. Underneath, the platform handles cluster lifecycle, resource management, metrics collection and cost accounting.</p><p>Prism also aims for a serverless feel. Users submit work and adjust configurations, while the system decides how to spin up clusters, scale them and shut them down. That does not remove Spark, but it changes how people interact with it.</p><p><strong>From experiment to production: the Prism user journey</strong></p><p>If you follow a typical workflow through Prism, you see three distinct phases.</p><p><strong>In pre-production, engineers are experimenting.</strong> They want to get a job off the ground quickly, test logic and refine their approach. Prism supports this by offering configuration-based templates and a UI where most of the setup is already done. Predefined profiles cover common use cases which means teams are not spending their first week just tuning cluster settings.</p><p><strong>Once a job is ready to run regularly, it enters productisation.</strong> At this point, Prism controls cluster setup, scaling and teardown through a unified API. Jobs can be tied into orchestration tools such as Airflow or Kubeflow without every team reinventing the wheel. Dashboards, metrics and metadata tracked by Prism give users a cleaner window into how their jobs behave.</p><p><strong>Post-production, attention moves to reliability and efficiency.</strong> Prism takes on this work by centralising monitoring and alerts, storing rich metrics and offering autotuning features that can recommend or apply improvements. Job costs are tracked, and infrastructure upgrades happen at the platform layer instead of per-team.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Kdk9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38c2eae-4802-4a68-b3b3-2a77fcb112ec_1057x400.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Kdk9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38c2eae-4802-4a68-b3b3-2a77fcb112ec_1057x400.avif 424w, https://substackcdn.com/image/fetch/$s_!Kdk9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38c2eae-4802-4a68-b3b3-2a77fcb112ec_1057x400.avif 848w, https://substackcdn.com/image/fetch/$s_!Kdk9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38c2eae-4802-4a68-b3b3-2a77fcb112ec_1057x400.avif 1272w, https://substackcdn.com/image/fetch/$s_!Kdk9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38c2eae-4802-4a68-b3b3-2a77fcb112ec_1057x400.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Kdk9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38c2eae-4802-4a68-b3b3-2a77fcb112ec_1057x400.avif" width="1057" height="400" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f38c2eae-4802-4a68-b3b3-2a77fcb112ec_1057x400.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1057,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14912,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/179211962?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38c2eae-4802-4a68-b3b3-2a77fcb112ec_1057x400.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Kdk9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38c2eae-4802-4a68-b3b3-2a77fcb112ec_1057x400.avif 424w, https://substackcdn.com/image/fetch/$s_!Kdk9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38c2eae-4802-4a68-b3b3-2a77fcb112ec_1057x400.avif 848w, https://substackcdn.com/image/fetch/$s_!Kdk9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38c2eae-4802-4a68-b3b3-2a77fcb112ec_1057x400.avif 1272w, https://substackcdn.com/image/fetch/$s_!Kdk9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38c2eae-4802-4a68-b3b3-2a77fcb112ec_1057x400.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Prism components (Source: Snap)</figcaption></figure></div><div><hr></div><h4><strong>Prism architecture</strong></h4><p>Prism&#8217;s architecture is organised around two main user-facing interfaces and a set of internal systems.</p><p>For users, the first touchpoint is the Prism UI, a console where they can author jobs, inspect runs, debug failures and tune performance. The second is a client SDK that exposes Prism&#8217;s capabilities programmatically. Together, these give ML and data engineers both an interactive and an automated way to work with Spark.</p><p>Behind those sits Prism Template, a framework for composing Spark jobs out of structured, reusable blocks. Instead of asking every engineer to shape their own Spark application, Prism Template gives them a vocabulary of modules they can chain together with YAML.</p><p>All external requests hit a central API and then flow into the Prism Control Plane. This control plane is responsible for managing job metadata and configuration. It delegates orchestration work to a workflow system powered by Temporal. Temporal workflows handle cluster provisioning, job submission, retries, cancellation and similar runtime tasks.</p><p>The whole stack ties into Snap&#8217;s internal services for metrics, cost tracking and orchestration. The idea is to have one platform that understands both the Spark world and the rest of Snap&#8217;s infrastructure landscape.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2ACo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ce2eff-d230-4cda-8a54-b771e0470e38_1600x1265.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2ACo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ce2eff-d230-4cda-8a54-b771e0470e38_1600x1265.avif 424w, https://substackcdn.com/image/fetch/$s_!2ACo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ce2eff-d230-4cda-8a54-b771e0470e38_1600x1265.avif 848w, https://substackcdn.com/image/fetch/$s_!2ACo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ce2eff-d230-4cda-8a54-b771e0470e38_1600x1265.avif 1272w, https://substackcdn.com/image/fetch/$s_!2ACo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ce2eff-d230-4cda-8a54-b771e0470e38_1600x1265.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2ACo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ce2eff-d230-4cda-8a54-b771e0470e38_1600x1265.avif" width="1456" height="1151" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3ce2eff-d230-4cda-8a54-b771e0470e38_1600x1265.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1151,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33398,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/179211962?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ce2eff-d230-4cda-8a54-b771e0470e38_1600x1265.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2ACo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ce2eff-d230-4cda-8a54-b771e0470e38_1600x1265.avif 424w, https://substackcdn.com/image/fetch/$s_!2ACo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ce2eff-d230-4cda-8a54-b771e0470e38_1600x1265.avif 848w, https://substackcdn.com/image/fetch/$s_!2ACo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ce2eff-d230-4cda-8a54-b771e0470e38_1600x1265.avif 1272w, https://substackcdn.com/image/fetch/$s_!2ACo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3ce2eff-d230-4cda-8a54-b771e0470e38_1600x1265.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Prism architecture (Source: Snap)</figcaption></figure></div><p><strong>A single home for Spark users: Prism UI console</strong></p><p>Before the Prism UI, Spark users at Snap lived inside a messy toolkit. They had an infra-owned library and CLI to standardise job submission, then Spark UI and History Server to debug jobs, HDFS tooling for storage, Dataproc Console and Stackdriver for cloud-level views, and separate internal systems for metrics and cost.</p><p>Each of these tools did something useful, but taken together, they were a scattered experience. New users had to learn not only Spark, but also the map of where to click when something went wrong. Even experienced teams were wasting time stitching context together from five tabs.</p><p>The Prism UI Console was built to compress this spread. It gives Spark users a single place to search jobs, view runs, understand configuration, inspect metrics and author new work. Platform teams now have a clear surface to invest in, and engineers have one source of truth for their Spark workloads.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;1753870b-b071-44e3-9630-3b5947620d00&quot;,&quot;duration&quot;:null}"></div><p>The result is lower friction, faster onboarding and a much clearer path for incremental usability improvements in the future.</p><p><strong>What you can do in the Prism UI</strong></p><p>The Prism UI is the primary surface area for Spark users at Snap. Key capabilities include:</p><ul><li><p><strong>Unified job search: </strong>A central search page lets users filter by job name, namespace, cluster ID, and other attributes. When something breaks, they no longer have to bookmark multiple systems.</p></li><li><p><strong>Metadata storage: </strong>Job and cluster metadata, such as configurations, metrics, and lineage, are stored in a scalable backend. This supports analytics, audits, and better platform decisions.</p></li><li><p><strong>Logical job grouping for trend analysis: </strong>Jobs are grouped by orchestration task IDs, for example from Airflow or Kubeflow. This makes it easy to look at long-term trends in runtime, cost, and resource use for a specific workflow.</p></li><li><p><strong>Integrated real-time cost estimation: </strong>Through integration with internal billing systems, users can see cost estimates while jobs run. This is especially helpful during heavy experimentation when budgets matter.</p></li><li><p><strong>One-click utilities and deep links: </strong>Utilities like job cloning, as well as deep links into Spark UI, logs, and output tables, make iteration and debugging faster.</p></li><li><p><strong>Integrated job authoring: </strong>Users can</p><ul><li><p>Configure sources and sinks with built-in support for Iceberg, TFRecords, Parquet, and BigQuery</p></li><li><p>Use an SQL editor with autocomplete powered by a metastore-aware schema integration</p></li><li><p>Pick preconfigured job profiles created by Spark experts</p></li><li><p>Jump directly to the Lakehouse UI for Iceberg-backed outputs and query them via Trino</p></li><li><p>Create low-code jobs using Prism Templates</p></li></ul></li></ul><p>This is where most ML users feel the impact. Instead of wrestling with scattered tools, they have a single console designed for how they work.</p><div><hr></div><h4><strong>Prism templates: opinionated Spark jobs without the boilerplate</strong></h4><p>Spark&#8217;s flexibility is both a strength and a risk. The same workload can be written in very different ways which leads to wide differences in structure, resource usage and maintainability. In a large organisation, that turns into a support headache.</p><p>Prism Template is Snap&#8217;s way of putting structure on top of that flexibility. Instead of everyone writing full Spark applications, users define jobs in YAML using reusable modules and standardised patterns. The platform takes ownership of job bootstrap, configuration and core logic wiring.</p><p>This approach makes experimentation easier for ML engineers. They can assemble pipelines quickly without having to understand every detail of Spark&#8217;s internals. Later on, as jobs move toward production, teams can adjust centralised configurations and modules to scale more gracefully, without rewriting their application code.</p><p><strong>Why templates matter in practice</strong></p><p>The core benefits of this approach:</p><ul><li><p><strong>Simplified job authoring: </strong>Users describe jobs via a YAML file instead of writing low-level Spark boilerplate. The same definition can move from local development to staging and then production.</p></li><li><p><strong>High-quality, reusable components: </strong>The platform ships with modules that cover tasks like:</p><ul><li><p>Iceberg ingestion</p></li><li><p>Feature extraction</p></li><li><p>Sequence building and manipulation</p></li></ul><p>These modules encode best practices, which reduces user errors and keeps jobs consistent.</p></li><li><p><strong>Integrated observability and tooling: </strong>Because Prism owns the bootstrap and core modules, it can inject metrics, logging, and other operational hooks in a uniform way.</p></li><li><p><strong>Managed versioning: </strong>Job definitions, driver JARs, and plugin JARs are versioned and managed centrally. This supports safe upgrades and stable behavior across environments.</p></li><li><p><strong>Customisable templates: </strong>Users can start from pre-built templates, then add their own parameters or chain modules in new ways.</p></li></ul><p>The example below shows a Prism Template YAML snippet that combined two modules in one job: one that builds an ordered sequence column and one that ingests the result into an Iceberg table.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4ziz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea38cec-5aad-4034-9a08-8e37ac4dec38_1293x783.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4ziz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea38cec-5aad-4034-9a08-8e37ac4dec38_1293x783.png 424w, https://substackcdn.com/image/fetch/$s_!4ziz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea38cec-5aad-4034-9a08-8e37ac4dec38_1293x783.png 848w, https://substackcdn.com/image/fetch/$s_!4ziz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea38cec-5aad-4034-9a08-8e37ac4dec38_1293x783.png 1272w, https://substackcdn.com/image/fetch/$s_!4ziz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea38cec-5aad-4034-9a08-8e37ac4dec38_1293x783.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4ziz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea38cec-5aad-4034-9a08-8e37ac4dec38_1293x783.png" width="1293" height="783" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ea38cec-5aad-4034-9a08-8e37ac4dec38_1293x783.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:783,&quot;width&quot;:1293,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54191,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/179211962?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea38cec-5aad-4034-9a08-8e37ac4dec38_1293x783.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4ziz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea38cec-5aad-4034-9a08-8e37ac4dec38_1293x783.png 424w, https://substackcdn.com/image/fetch/$s_!4ziz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea38cec-5aad-4034-9a08-8e37ac4dec38_1293x783.png 848w, https://substackcdn.com/image/fetch/$s_!4ziz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea38cec-5aad-4034-9a08-8e37ac4dec38_1293x783.png 1272w, https://substackcdn.com/image/fetch/$s_!4ziz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea38cec-5aad-4034-9a08-8e37ac4dec38_1293x783.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The structure is easy to read and the heavy logic lives in reusable modules.</p><div><hr></div><h4><strong>The Prism control plane: making Spark feel serverless</strong></h4><p>The first take on a control plane at Snap was lean. It wrapped <a href="https://docs.cloud.google.com/dataproc/docs/reference/rest">Dataproc APIs</a>, handled some permissions and exposed separate concepts for clusters and jobs. Orchestration tools such as Airflow still had to own cluster lifecycle, including creation, reuse, teardown and failure handling.</p><p>At small scale, that model works. As usage grows, human-managed cluster lifecycle turns into a liability. Teams end up carrying subtle differences in how they handle errors and retries. Operational load rises. Reliability drops.</p><p>The team redesigned the control plane around a different principle: one simple job submission interface that hides cluster lifecycle.</p><p>The redesigned system:</p><ul><li><p>Presents a single API endpoint for job submission</p></li><li><p>Internally handles cluster provisioning, monitoring, retries, and shutdown</p></li><li><p>Uses a workflow engine built on Temporal to orchestrate these steps</p></li></ul><p>The division of responsibilities is clear:</p><ul><li><p>The control plane manages metadata and configuration</p></li><li><p>Temporal workflows manage the runtime orchestration</p></li></ul><p>This improves reliability and reduces cognitive load for users. It also creates a base for smarter features such as autotuning and intelligent retry policies, since the platform now owns the whole lifecycle rather than just parts of it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xjuv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0401a0c7-1c45-4787-ad78-5d4b191350e0_1365x1035.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xjuv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0401a0c7-1c45-4787-ad78-5d4b191350e0_1365x1035.avif 424w, https://substackcdn.com/image/fetch/$s_!Xjuv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0401a0c7-1c45-4787-ad78-5d4b191350e0_1365x1035.avif 848w, https://substackcdn.com/image/fetch/$s_!Xjuv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0401a0c7-1c45-4787-ad78-5d4b191350e0_1365x1035.avif 1272w, https://substackcdn.com/image/fetch/$s_!Xjuv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0401a0c7-1c45-4787-ad78-5d4b191350e0_1365x1035.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xjuv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0401a0c7-1c45-4787-ad78-5d4b191350e0_1365x1035.avif" width="1365" height="1035" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0401a0c7-1c45-4787-ad78-5d4b191350e0_1365x1035.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1035,&quot;width&quot;:1365,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31947,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/179211962?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0401a0c7-1c45-4787-ad78-5d4b191350e0_1365x1035.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xjuv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0401a0c7-1c45-4787-ad78-5d4b191350e0_1365x1035.avif 424w, https://substackcdn.com/image/fetch/$s_!Xjuv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0401a0c7-1c45-4787-ad78-5d4b191350e0_1365x1035.avif 848w, https://substackcdn.com/image/fetch/$s_!Xjuv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0401a0c7-1c45-4787-ad78-5d4b191350e0_1365x1035.avif 1272w, https://substackcdn.com/image/fetch/$s_!Xjuv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0401a0c7-1c45-4787-ad78-5d4b191350e0_1365x1035.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Prism control plane (Source: Snap)</figcaption></figure></div><div><hr></div><h4><strong>Metrics and autotuning: turning Spark signals into smarter defaults</strong></h4><p>Spark and Dataproc expose a large amount of metrics out of the box. The problem is not availability, it is usability.</p><p>The raw metrics:</p><ul><li><p>Exist at multiple levels: job, stage, task, cluster</p></li><li><p>Are not always structured for time-series analysis</p></li><li><p>Use inconsistent naming and retention policies across sources</p></li><li><p>Are difficult to use as inputs for automation</p></li></ul><p>To fix this, Snap built a dedicated metrics system for Spark workloads in Prism.</p><p>This system:</p><ul><li><p>Ingests selected signals from jobs, clusters, and infrastructure</p></li><li><p>Normalizes them into a coherent schema</p></li><li><p>Stores them in a centralized Spanner database for durability and consistency</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TpXl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6149b1a6-ba47-4779-ba1a-9749453d186f_1999x740.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TpXl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6149b1a6-ba47-4779-ba1a-9749453d186f_1999x740.avif 424w, https://substackcdn.com/image/fetch/$s_!TpXl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6149b1a6-ba47-4779-ba1a-9749453d186f_1999x740.avif 848w, https://substackcdn.com/image/fetch/$s_!TpXl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6149b1a6-ba47-4779-ba1a-9749453d186f_1999x740.avif 1272w, https://substackcdn.com/image/fetch/$s_!TpXl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6149b1a6-ba47-4779-ba1a-9749453d186f_1999x740.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TpXl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6149b1a6-ba47-4779-ba1a-9749453d186f_1999x740.avif" width="1456" height="539" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6149b1a6-ba47-4779-ba1a-9749453d186f_1999x740.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:539,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28905,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/179211962?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6149b1a6-ba47-4779-ba1a-9749453d186f_1999x740.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TpXl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6149b1a6-ba47-4779-ba1a-9749453d186f_1999x740.avif 424w, https://substackcdn.com/image/fetch/$s_!TpXl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6149b1a6-ba47-4779-ba1a-9749453d186f_1999x740.avif 848w, https://substackcdn.com/image/fetch/$s_!TpXl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6149b1a6-ba47-4779-ba1a-9749453d186f_1999x740.avif 1272w, https://substackcdn.com/image/fetch/$s_!TpXl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6149b1a6-ba47-4779-ba1a-9749453d186f_1999x740.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Prism metric architecture (Source: Snap)</figcaption></figure></div><p>With this foundation, Prism can:</p><ul><li><p>Show actionable metrics in the UI console</p></li><li><p>Back features like autoscaling and intelligent retries</p></li><li><p>Support autotuning features that adjust configuration based on observed behavior</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wh4a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16b71b3-e0c7-4f1c-bf75-ca2eb85fd0c7_1065x800.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wh4a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16b71b3-e0c7-4f1c-bf75-ca2eb85fd0c7_1065x800.avif 424w, https://substackcdn.com/image/fetch/$s_!wh4a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16b71b3-e0c7-4f1c-bf75-ca2eb85fd0c7_1065x800.avif 848w, https://substackcdn.com/image/fetch/$s_!wh4a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16b71b3-e0c7-4f1c-bf75-ca2eb85fd0c7_1065x800.avif 1272w, https://substackcdn.com/image/fetch/$s_!wh4a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16b71b3-e0c7-4f1c-bf75-ca2eb85fd0c7_1065x800.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wh4a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16b71b3-e0c7-4f1c-bf75-ca2eb85fd0c7_1065x800.avif" width="1065" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e16b71b3-e0c7-4f1c-bf75-ca2eb85fd0c7_1065x800.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1065,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24456,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/179211962?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16b71b3-e0c7-4f1c-bf75-ca2eb85fd0c7_1065x800.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wh4a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16b71b3-e0c7-4f1c-bf75-ca2eb85fd0c7_1065x800.avif 424w, https://substackcdn.com/image/fetch/$s_!wh4a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16b71b3-e0c7-4f1c-bf75-ca2eb85fd0c7_1065x800.avif 848w, https://substackcdn.com/image/fetch/$s_!wh4a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16b71b3-e0c7-4f1c-bf75-ca2eb85fd0c7_1065x800.avif 1272w, https://substackcdn.com/image/fetch/$s_!wh4a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16b71b3-e0c7-4f1c-bf75-ca2eb85fd0c7_1065x800.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Prism metrics UI (Source: Snap)</figcaption></figure></div><p>Importantly, this metrics work was done in parallel with the UI console, so both evolved together. The result is a unified experience where what the user sees and what the automation uses come from the same underlying system.</p><div><hr></div><h4><strong>How Prism spread across Snap</strong></h4><p>The clearest sign that a platform is working is usage. Prism&#8217;s daily job counts have climbed from single digits to several thousand per day, with peaks above 10,000 jobs.</p><p>The pattern of adoption falls into two main buckets.</p><ol><li><p><strong>Direct use by advanced Spark teams: </strong>Teams with complex Spark needs use Prism directly. Their workloads often involve:</p></li></ol><ul><li><p>Large joins</p></li><li><p>Tight coupling with specific data models</p></li><li><p>Custom logic that does not fit into a narrow &#8220;standard pipeline&#8221; box</p></li></ul><p>These teams still get value from Prism&#8217;s abstractions and control plane, but they stay close to the underlying capabilities.</p><ol start="2"><li><p><strong>Integration into internal platforms: </strong>Other teams do not think in terms of Spark at all. They work with internal tools for:</p></li></ol><ul><li><p>ML data preparation</p></li><li><p>Feature engineering</p></li><li><p>Experimentation</p></li></ul><p>Those tools, in turn, embed Prism. The teams&#8217; users get domain-specific interfaces, while Prism quietly runs the heavy Spark work in the background.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WJDg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F146482cb-18cf-4df7-94aa-1868ed8aaae0_1390x800.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WJDg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F146482cb-18cf-4df7-94aa-1868ed8aaae0_1390x800.avif 424w, https://substackcdn.com/image/fetch/$s_!WJDg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F146482cb-18cf-4df7-94aa-1868ed8aaae0_1390x800.avif 848w, https://substackcdn.com/image/fetch/$s_!WJDg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F146482cb-18cf-4df7-94aa-1868ed8aaae0_1390x800.avif 1272w, https://substackcdn.com/image/fetch/$s_!WJDg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F146482cb-18cf-4df7-94aa-1868ed8aaae0_1390x800.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WJDg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F146482cb-18cf-4df7-94aa-1868ed8aaae0_1390x800.avif" width="1390" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/146482cb-18cf-4df7-94aa-1868ed8aaae0_1390x800.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1390,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21651,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/avif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/179211962?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F146482cb-18cf-4df7-94aa-1868ed8aaae0_1390x800.avif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WJDg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F146482cb-18cf-4df7-94aa-1868ed8aaae0_1390x800.avif 424w, https://substackcdn.com/image/fetch/$s_!WJDg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F146482cb-18cf-4df7-94aa-1868ed8aaae0_1390x800.avif 848w, https://substackcdn.com/image/fetch/$s_!WJDg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F146482cb-18cf-4df7-94aa-1868ed8aaae0_1390x800.avif 1272w, https://substackcdn.com/image/fetch/$s_!WJDg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F146482cb-18cf-4df7-94aa-1868ed8aaae0_1390x800.avif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Supporting both direct and embedded use is a crucial design choice. It lets Prism spread across Snap without forcing every user into the same interface or abstraction level.</p><div><hr></div><h3>The full scoop</h3><p>To learn more about this, check <a href="https://eng.snap.com/prism">Snapchat's Engineering Blog</a> post on this topic</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you liked this post and don&#8217;t want to miss the next one, subscribe to Data Tinkerer!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you are already subscribed and enjoyed the article, please give it a like and/or share it others, really appreciate it &#128591;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/p/how-snap-rebuilt-its-ml-platform-to-handle-10000-daily-spark-jobs?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Keep learning</h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;8b717c15-913f-4e54-91a7-fb3f26e15721&quot;,&quot;caption&quot;:&quot;How do you keep data fresh for millions of merchants when you&#8217;re streaming from 100+ MySQL shards?<br /><br />Shopify&#8217;s answer: a 400TB Change Data Capture platform that pushes up to 100k events a second.<br /><br />This post dives into the trade-offs, the challenges and the lessons learned from building CDC at scale.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Shopify Uses Change Data Capture to Serve Millions of Merchants&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-09-18T07:53:42.206Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1730818874996-dea4bddf5554?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaG9waWZ5fGVufDB8fHx8MTc1ODE4MDY0NHww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-shopify-uses-change-data-capture-to-serve-millions-of-merchants&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:173822667,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:0,&quot;publication_id&quot;:3422740,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;885df221-54f0-4845-8a4f-af4f0d0e2648&quot;,&quot;caption&quot;:&quot;Cold starts, version drift and clunky notebooks, Grab hit all the classic headaches of streaming at scale.<br /><br />Here&#8217;s how they fixed it with FlinkSQL + Kafka.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Grab Shrunk Real-Time Queries from 5 Minutes to 1 with FlinkSQL and Kafka&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:291590442,&quot;name&quot;:&quot;Data Tinkerer&quot;,&quot;bio&quot;:&quot;Ex-head of analytics sharing deep dives and learnings about AI and all things data (science, engineering, analysis)&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d3bbe0-5fb8-4f8d-9b74-036abbd6fec9_500x500.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-08-21T06:45:35.215Z&quot;,&quot;cover_image&quot;:&quot;https://images.unsplash.com/photo-1587476351660-e9fa4bb8b26c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxncmFifGVufDB8fHx8MTc1NTQ5OTAzOHww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.datatinkerer.io/p/how-grab-shrunk-real-time-queries-from-five-minutes-to-one&quot;,&quot;section_name&quot;:&quot;Data Engineering&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:171226398,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:4,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Data Tinkerer&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JEdj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bea5ccd-f356-4154-a1db-16268300510e_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[From Data Analyst to Senior DS Manager at Skyscanner]]></title><description><![CDATA[How a mechanical engineer found data through robotics. Data led to modelling. Modelling led to managing teams at Skyscanner.]]></description><link>https://www.datatinkerer.io/p/from-data-analyst-to-senior-ds-manager-at-skyscanner</link><guid isPermaLink="false">https://www.datatinkerer.io/p/from-data-analyst-to-senior-ds-manager-at-skyscanner</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 13 Nov 2025 03:54:26 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/06735d58-e8f2-4106-88ae-efe0658c217c_764x661.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers,</p><p>Following on from previous posts talking to people in the field, today I will be talking with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Jose Parre&#241;o Garcia&quot;,&quot;id&quot;:255728031,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!h_mv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4dad41-478b-4960-a5e0-98ed1e54657e_1168x1046.jpeg&quot;,&quot;uuid&quot;:&quot;fe93d583-b52c-4160-878b-e32c4f822419&quot;}" data-component-name="MentionToDOM"></span> who is a Senior Data Science Manager at Skyscanner and writer of the <em>Senior Data Science Lead</em> newsletter.</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:2833541,&quot;name&quot;:&quot;Senior Data Science Lead&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!t4IN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbe3704e-4589-40b2-bbb8-007336c4f09a_990x990.png&quot;,&quot;base_url&quot;:&quot;https://joseparreogarcia.substack.com&quot;,&quot;hero_text&quot;:&quot;Helping managers build world-class teams, data professionals master storytelling and guiding those looking to break into Data Science. I have built teams from scratch and lead 50+ data scientists. Now, I share my experience with you.&quot;,&quot;author_name&quot;:&quot;Jose Parre&#241;o Garcia&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#ffffff&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://joseparreogarcia.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!t4IN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbe3704e-4589-40b2-bbb8-007336c4f09a_990x990.png" width="56" height="56" style="background-color: rgb(255, 255, 255);"><span class="embedded-publication-name">Senior Data Science Lead</span><div class="embedded-publication-hero-text">Helping managers build world-class teams, data professionals master storytelling and guiding those looking to break into Data Science. I have built teams from scratch and lead 50+ data scientists. Now, I share my experience with you.</div><div class="embedded-publication-author-name">By Jose Parre&#241;o Garcia</div></a><form class="embedded-publication-subscribe" method="GET" action="https://joseparreogarcia.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><p>We talked about his rise from data analyst to Senior DS Manager at Skyscanner, what &#8220;production-ready&#8221; really means and why the real intelligence in data science lives before and after the model.</p><p>So without further ado, let&#8217;s get into it!</p>
      <p>
          <a href="https://www.datatinkerer.io/p/from-data-analyst-to-senior-ds-manager-at-skyscanner">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[What the Data Crowd Was Reading in October 2025]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in October 2025.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 06 Nov 2025 07:22:24 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a00481e6-bc3b-4419-9304-ed408b193853_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>It&#8217;s time for another round-up on all things data!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OR1N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OR1N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!OR1N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!OR1N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!OR1N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OR1N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/178132882?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OR1N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!OR1N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!OR1N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!OR1N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918f7663-13a5-41a8-bdaa-35c9ac058e66_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for October.</p>
      <p>
          <a href="https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[From Dental Cleaning to Data Cleaning: How I Pivoted to Healthcare Analytics]]></title><description><![CDATA[How a simple Google search led to SQL. SQL led to a new career. And hat career led to teaching others on LinkedIn Learning]]></description><link>https://www.datatinkerer.io/p/from-dental-cleaning-to-data-cleaning-pivoting-to-healthcare-analytics</link><guid isPermaLink="false">https://www.datatinkerer.io/p/from-dental-cleaning-to-data-cleaning-pivoting-to-healthcare-analytics</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 23 Oct 2025 03:01:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/10e64530-327b-4752-b767-cb3fd1d900e1_764x767.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers,</p><p>I&#8217;m continuing the series where we hear from people working in data - how they got here, what they&#8217;ve learned and what their day-to-day really looks like.</p><p>Meet <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Thais Cooke&quot;,&quot;id&quot;:61993584,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b384f538-870a-44af-809c-33b4fc046389_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;9becde07-316e-44e8-b766-9e8c521b03f2&quot;}" data-component-name="MentionToDOM"></span> who is a Senior Healthcare Data Analyst and writer of the <em>Journal of a Data Analyst </em>newsletter.</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:3998386,&quot;name&quot;:&quot;Journal of a Data Analyst&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!u5z_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72c246f-8a0c-4dfd-a3af-1afccfe23146_608x608.png&quot;,&quot;base_url&quot;:&quot;https://thaiscooke.substack.com&quot;,&quot;hero_text&quot;:&quot;A journal-style newsletter exploring the data industry and careers - mixing insights, discussions, and practical knowledge.&quot;,&quot;author_name&quot;:&quot;Thais Cooke&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#fef2f2&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://thaiscooke.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!u5z_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd72c246f-8a0c-4dfd-a3af-1afccfe23146_608x608.png" width="56" height="56" style="background-color: rgb(254, 242, 242);"><span class="embedded-publication-name">Journal of a Data Analyst</span><div class="embedded-publication-hero-text">A journal-style newsletter exploring the data industry and careers - mixing insights, discussions, and practical knowledge.</div><div class="embedded-publication-author-name">By Thais Cooke</div></a><form class="embedded-publication-subscribe" method="GET" action="https://thaiscooke.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><p>We talked about her unplanned pivot into data, what healthcare data analysis look like and how she thought she was being scammed by a LinkedIn &#8216;Impostor&#8217;.</p><p>So without further ado, let&#8217;s get into it!</p>
      <p>
          <a href="https://www.datatinkerer.io/p/from-dental-cleaning-to-data-cleaning-pivoting-to-healthcare-analytics">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[From Marketing to Data Engineering: How I Made the Switch]]></title><description><![CDATA[How one marketer followed the trail of tracking pixels into pipelines and built a career turning messy data into usable systems.]]></description><link>https://www.datatinkerer.io/p/from-marketing-to-data-engineering-how-i-made-the-switch</link><guid isPermaLink="false">https://www.datatinkerer.io/p/from-marketing-to-data-engineering-how-i-made-the-switch</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 16 Oct 2025 04:01:28 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/9f252334-7437-40b5-9f82-08c981de2f6d_761x764.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers,</p><p>Lately, I&#8217;ve been thinking about starting a new series where people working in data share how they got here, what they&#8217;ve learned along the way and what their day-to-day looks like.</p><p>So, I&#8217;m kicking it off today with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Alejandro Aboy&quot;,&quot;id&quot;:22949723,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!u1Ao!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdca2c63d-9f5e-4cd3-99ac-7d8e71dc114b_1024x1024.jpeg&quot;,&quot;uuid&quot;:&quot;ee6ab692-9d69-4714-8986-9b599e2b5557&quot;}" data-component-name="MentionToDOM"></span>, Senior Data Engineer at Workpath and writer of <em>The Pipe and The Line</em> newsletter.</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:1196229,&quot;name&quot;:&quot;The Pipe &amp; The Line&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!vmrQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d5b2131-da28-4621-ad6f-9574cbc41a1e_500x500.png&quot;,&quot;base_url&quot;:&quot;https://thepipeandtheline.substack.com&quot;,&quot;hero_text&quot;:&quot;Hands-on guides, tools, and experiments to sharpen your Data &amp; AI Engineering skills from someone who learned it all in the wild.&quot;,&quot;author_name&quot;:&quot;Alejandro Aboy&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#131826&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://thepipeandtheline.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!vmrQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d5b2131-da28-4621-ad6f-9574cbc41a1e_500x500.png" width="56" height="56" style="background-color: rgb(19, 24, 38);"><span class="embedded-publication-name">The Pipe &amp; The Line</span><div class="embedded-publication-hero-text">Hands-on guides, tools, and experiments to sharpen your Data &amp; AI Engineering skills from someone who learned it all in the wild.</div><div class="embedded-publication-author-name">By Alejandro Aboy</div></a><form class="embedded-publication-subscribe" method="GET" action="https://thepipeandtheline.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><p>We talked about how he went from marketing to data engineering, what his workflow looks like, why he was called an <em>octopus</em> and why he thinks &#8220;big data&#8221; is a fool&#8217;s errand for most teams.</p><p>So without further ado, let&#8217;s get into it!</p>
      <p>
          <a href="https://www.datatinkerer.io/p/from-marketing-to-data-engineering-how-i-made-the-switch">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Dropbox Made AI Evaluation Work at Scale]]></title><description><![CDATA[Every prompt, retriever and model now has to earn its merge.]]></description><link>https://www.datatinkerer.io/p/how-dropbox-made-ai-evaluation-work-at-scale</link><guid isPermaLink="false">https://www.datatinkerer.io/p/how-dropbox-made-ai-evaluation-work-at-scale</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 09 Oct 2025 07:14:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!oMNY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a58eb26-c9ac-492d-96f2-343a7f503ddc_800x450.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers!</p><p>Today we will look at how Dropbox does eval for its conversational AI</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Roe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Roe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Roe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/175671629?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Roe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!2Roe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e760a1-29af-4234-b7eb-7b6070bb0d44_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on)  provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.datatinkerer.io/leaderboard?&amp;referrer_token=4tlsmi&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Now, with that out of the way, let&#8217;s get to Dropbox&#8217;s AI evaluation!</p><h3>TL;DR</h3>
      <p>
          <a href="https://www.datatinkerer.io/p/how-dropbox-made-ai-evaluation-work-at-scale">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[What the Data Crowd Was Reading in September 2025]]></title><description><![CDATA[Tools, techniques and deep dives worth reading that I came across in September 2025.]]></description><link>https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-september-2025</link><guid isPermaLink="false">https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-september-2025</guid><dc:creator><![CDATA[Data Tinkerer]]></dc:creator><pubDate>Thu, 02 Oct 2025 07:19:13 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/45dce40b-ba8c-432c-88c2-c2e8ab6c888f_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fellow Data Tinkerers</p><p>It&#8217;s time for another round-up on all things data!</p><p>But before that, I wanted to share with you what you could unlock if you share Data Tinkerer with just <strong>1 more person</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bZIb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a9e6319-3141-4d78-9204-2debce75a3ea_800x402.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bZIb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a9e6319-3141-4d78-9204-2debce75a3ea_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!bZIb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a9e6319-3141-4d78-9204-2debce75a3ea_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!bZIb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a9e6319-3141-4d78-9204-2debce75a3ea_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!bZIb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a9e6319-3141-4d78-9204-2debce75a3ea_800x402.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bZIb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a9e6319-3141-4d78-9204-2debce75a3ea_800x402.gif" width="800" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a9e6319-3141-4d78-9204-2debce75a3ea_800x402.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3369150,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.datatinkerer.io/i/174976801?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a9e6319-3141-4d78-9204-2debce75a3ea_800x402.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bZIb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a9e6319-3141-4d78-9204-2debce75a3ea_800x402.gif 424w, https://substackcdn.com/image/fetch/$s_!bZIb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a9e6319-3141-4d78-9204-2debce75a3ea_800x402.gif 848w, https://substackcdn.com/image/fetch/$s_!bZIb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a9e6319-3141-4d78-9204-2debce75a3ea_800x402.gif 1272w, https://substackcdn.com/image/fetch/$s_!bZIb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a9e6319-3141-4d78-9204-2debce75a3ea_800x402.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are 100+ resources to learn all things data (science, engineering, analysis). It includes videos, courses, projects and can be filtered by tech stack (Python, SQL, Spark and etc), skill level (Beginner, Intermediate and so on) provider name or free/paid. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.datatinkerer.io/leaderboard?&amp;utm_source=post&quot;,&quot;text&quot;:&quot;Refer a friend&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.datatinkerer.io/leaderboard?&amp;utm_source=post"><span>Refer a friend</span></a></p><p>Without further ado, let&#8217;s get to the round up for September.</p>
      <p>
          <a href="https://www.datatinkerer.io/p/what-the-data-crowd-was-reading-in-september-2025">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>