Zuck is Watching You, Agents Get Smarter and AI Videos Get Less Weird

10M-token prompts, multitasking agents, and smooth AI-generated short films, all in one week.

Apr 07, 2025

Fellow Data Tinkerers!

Thank you to those who provided feedback last week and to those who shared the publication with others. I really appreciate it ❤️

I wanted to share an example of what you could unlock if you share Data Tinkerer with just 3 other people.

There are 100+ more cheat sheets covering everything from Python, R, SQL, Spark to Power BI, Tableau, Git and many more. So if you know other people who like staying up to date on all things data, please share Data Tinkerer with them!

Refer a friend

Now, with that out of the way, let’s get to this week’s news round up on all things data and AI

The Buzz 🐝

It was a busy weekend at Meta’s headquarters as they dropped their latest Llama 4 models on Saturday:

Their biggest model, Behemoth, is still in preview and not released yet. But what caught the attention of everyone was the huge context window (1 million for Maverick and 10 Million for Scout). While the huge context window is impressive, the jury is still out there about the quality of the answers. With such a big context window, some people think the quality of the outputs will not be great.

A more important question that Zuck is dying to know is:

AI search startup Genspark has dropped a new general-purpose ‘Super Agent’ that outperforms both the viral Manus agent and OpenAI’s Deep Research on the GAIA benchmark.

It handles multi-step tasks like building a travel plan, picking nearby restaurants, and even calling to book a table. You can check their introduction video below
(P.S. I thought the video was created by AI but apparently it’s real. Let me know what you think!)

Runway dropped Gen-4, a new AI video model with sharper visuals and scene consistency, letting creators generate coherent, long-form content with zero extra training. They also shared some interesting short films made with Gen-4 that you can check below:

This comment on the video summarizes whole thing pretty well

Data Science & AI

Inside the Mind of an LLM
Anthropic recently published an interesting piece of research looking inside Claude’s “brain”. In this article, we have summarized the interesting tidbits and the key takeaways. So if you want to see what’s happens, under the hood of an LLM, give this a read

OpenAI Academy
OpenAI just expanded its free AI Academy, adding more tutorials, live workshops, and beginner-friendly content like “AI for older adults”.
Finally, a class where both participants and the model respond with “I’m sorry, could you repeat that?
Improving Search for 1B+ LinkedIn Users with GenAI
Discover how LinkedIn used AI to refine search suggestions and create a better user experience

Data Engineering

How to Design Customizable Data Indexing Pipelines
This article walks through how to build flexible data indexing pipelines, covering steps like parsing PDFs, splitting documents into chunks, and generating embeddings for better search. It also dives into boosting results with metadata enrichment and a mix of TF-IDF and vector search.

How Datadog Achieved 99% Timeout Reduction with 20x Scalability Boost
Learn about the architecture that cut costs by 50% for the company

Data Analysis and Visualisation

The Reality of Using SQL Day-to-Day as a Data Analyst
The article discusses practical challenges and insights of using SQL in a real-world data analyst role

The World’s Data Centres
Interesting visulization of number of data centres around the world

Meme of the Week - Putting compute to good use

If you have any other feedback, please reply back to the email. I read all replies and really appreciate the feedback :)

Data Tinkerer