Rif Kiamil

I teach and educate about Data, SQL & Google BigQuery

London, United Kingdom

Actions

As a Google Developer Expert, I teach and educate minorities and marginalised groups on Data, SQL & Google BigQuery using sports and blockchain data.

Area of Expertise

Finance & Banking
Information & Communications Technology
Physical & Life Sciences
Transports & Logistics

Topics

Google Cloud
Google Developer Experts
Google Developer Group
Google for Startups
Google App Engine
google bigquery
Database
Analytics and Big Data
Dynamics ERP
Data Warehousing
SQL Server
SQL Server Analysis Services
Microsoft Dynamics 365
Microsoft PowerBI
SIMD
DuckDB
duckdb
Parquet
Looker
LookerML
Vectors
LLMs
TPU
GPU
JAX
LLMOps

From #NoSQL to #NoDatawarehouse: A technically light-hearted history of enterprise database

In this talk, we will explore the concept of #NoDataWarehouse and address the confusion around data terms in the context of modern enterprise data management. We will delve into the history of data modelling and databases, discussing the different types of systems, such as OLAP and OLTP, and their relevance in today's ever-changing data landscape. Furthermore, we will examine the impact of Microsoft, Google and Snowflake. Innovations, including Google App Engine and BigQuery and the influence of MapReduce, BigTable & Dremel white paper, in shaping the world of data management.

Database & SQL Performance Tuning - What Happens to Data Types as They Enter the Shadows of the Data

📌 TL:DR - This presentation explores key encoding strategies, Offset Encoding, Dictionary Encoding, Bit-Packing, RLE, and Delta Encoding, and how data ordering influences their efficiency. You'll learn how data type selection affects query execution, encoding efficiency, and cost optimization in modern data warehouse systems.

🎯 Key Takeaways
By the end of this session, you should walk away with a clear understanding
* How your choices in data type selection can influence a data warehouse, potentially lowering costs through efficient encoding
* How do DuckDB/Parquet (open-source), Google BigQuery & Azure Synapse Dedicated Pool (VertiPaq Engine) implement this in reality?
* How your data type affects query execution by limiting operator choices or leveraging low-level CPU features like Single Instruction, Multiple Data (SIMD).

Google TPU-First, GPU-Second: Understanding Data Encoding,Arrow & Parquet’s Impact on Training

Most AI companies shout about GPUs. But here’s the truth: Google TPUs are the real cheat code. It’s not just about raw FLOPs or benchmark chasing, it’s about pipelines. When you design your machine learning stack TPU-first, GPU-second, you unlock massive gains in price and performance. This talk starts at the roots, by building an understanding of machine mentality. We’ll explore foundational concepts like Instructions Per Cycle (IPC) and vector processing, not just as theoretical metrics but as tools to reshape how you prepare, encode, and move data through your pipeline. 

Even if you're new to data encoding, Arrow, Parquet, JAX, or TPUs, you’ll walk away with clear mental models and practical insights on how these pieces connect to build performant ML systems.

Full Title - Google TPU-First, GPU-Second: Understanding Data Encoding—Arrow and Parquet’s Impact on Training Pipelines

Machine Mentality: Instructions Per Cycle & Vectors, Why Databases and LLMs Care

In “Machine Mentality: Instructions Per Cycle & Vectors, Why Databases and LLMs Care,” we peel back the layers of modern compute architectures to reveal how low-level encoding and vectorized execution, driven by Instructions Per Cycle (IPC) and Single Instruction, Multiple Data, pipelines, influence both database, data warehouse and large language models (LLM).

What You’ll Learn

** Encoding Strategies & Data Ordering
Explore how Offset, Dictionary, Bit-Packing, Run-Length, and Delta encodings interact with cache lines and prefetchers—and why ordering your data can make or break performance.

** Instruction Throughput & Vectorization
Understand how SIMD‐driven operators leverage wide registers to execute multiple elements per cycle, and why maximizing IPC matters for both query engines and transformer inference.

** Tokenization
Learn how the core of every transformer block consists of a small handful of linear algebra routines, and how using SIMD makes it possible to achieve those tokens-per-second speeds.

I/O Extended 2023 Sessionize Event

June 2023 Glasgow, United Kingdom

Rif Kiamil

I teach and educate about Data, SQL & Google BigQuery

London, United Kingdom

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Rif Kiamil

Actions

Links

Area of Expertise

Topics

Sessions

From #NoSQL to #NoDatawarehouse: A technically light-hearted history of enterprise database

Database & SQL Performance Tuning - What Happens to Data Types as They Enter the Shadows of the Data

Google TPU-First, GPU-Second: Understanding Data Encoding,Arrow & Parquet’s Impact on Training

Machine Mentality: Instructions Per Cycle & Vectors, Why Databases and LLMs Care

Events

I/O Extended 2023 Sessionize Event

Rif Kiamil

Links

Actions