Speaker

Paige Roberts

Paige Roberts

Author, Speaker, Data Nerd

Actions

Paige Roberts has worked as an engineer, trainer, support technician, technical writer, marketer, product manager, and a consultant in the last 28 years. She's built data engineering pipelines and architectures, documented and tested open source analytics implementations, worked with different industries, and questioned a lot of assumptions. She's worked for companies like Pervasive, the Bloor Group, Actian, Hortonworks, Syncsort, Vertica, and now, GridGain. An IASA certified enterprise data architect, she contributed to "97 Things Every Data Engineer Should Know," and co-authored "Accelerate Machine Learning with a Unified Analytics Architecture" and "Up and Running with Aerospike" from O'Reilly publishing. Thinkers 360 lists her as a top 10 thought leader in Data Architecture, and top 25 in both Analytics and Big Data. She promotes understanding of high scale data engineering, real-time distributed data processing, and how the analytics revolution is changing the world.

Area of Expertise

  • Information & Communications Technology

Topics

  • Enterprise Architecture
  • Solution Architecture
  • Data Architecture
  • information architecture
  • Analytics
  • BI & Analytics
  • Big Data Analytics
  • Big Data
  • Streaming Data Analytics
  • Software Analytics
  • Database and Analytics
  • Data analysis
  • Cloud analytics
  • analytics engineering
  • Cloud Architecture
  • Analytics and Big Data
  • Data Management
  • big data clusters
  • Data Analytics
  • Big Data Machine Learning AI and Analytics
  • Machine Learning and Artificial Intelligence
  • Machine Learning & AI
  • AI & Machine Learning
  • Machine Learning Engineering
  • Machine Learning

Data Connect 2024

Streaming Graph Processing on Categorical Data Enables Real-time Risk Calculation

Failures like the Silicon Valley Bank in 2023 is the extreme result of not accurately calculating risk in a timely manner. Nearly every financial institution has a focus on minimizing risk, but the way we calculate that inherently requires close analysis of categorical data and relationships. Yet the majority of our algorithms only work on static, numeric data. That means persisting the data, converting it using something like one hot encoding into numerical data that is bloated, sparse, and slow to analyze, then after analysis, often having to convert again to figure out the original categories. This is painfully slow, with the state of the art being measured in hours. If we could shift that analysis left, process the original categorical data as it streams in, without modification, that could cut mean time to insight down to seconds, and possibly save financial institutions some large dollar signs. That could also enable many other options, such as using graph NLP on flowing data, finding novel behavior, detecting anomalies such as cyber-attacks before they affect systems. The speed of an in-line data processing engine like Flink or KsqlDb combined with graph algorithms and categorical analysis is uniquely powerful. Come learn about a new open source streaming intelligence system that changes the game for risk analysis and other fast categorical data processing.

July 2024 Columbus, Ohio, United States

DBTA Data Summit 2024

Get Better Analytics by Putting Less Data in Your Database

A recent survey showed that 67% of companies had their software budgets cut during 2023. SaaS databases are easy to use and powerful, but they put a strain on budgets. Still, no one can afford to skimp on smart data analytics. How do you get more analytics out of your SaaS data warehouse/lakehouse, without spending more money? Treat incoming data streams as a graph. Relationships and categories of data can immediately be seen and acted upon. Duplicate entities can be resolved. Key pattern signals in noisy data streams can be pinpointed and the noise that you don’t need tossed out. By putting only relevant and clean data into analytical repositories, tons of useless data never have to be stored in pay-per-use systems, vastly reducing costs. You get smarter answers on clean, pre-filtered data in real time.

May 2024 Boston, Massachusetts, United States

Data Day Texas 2024

When linear scaling is too slow – strategies for high scale data processing

How does the TradeDesk handle 10 million ad auctions per second and generate 40 thousand reports in less than 6 hours on 15 petabytes of data? If you want to crunch all the data to train an LLM AI model, or handle real-time machine scale IoT problems for AIOps, or juggle millions of transactions per second, linear scaling is far too slow.
Is the answer a 1000-node database with a ton of memory on every node? If it was, companies like the TradeDesk would have to declare bankruptcy. Throwing more nodes or serverless executors at the problem either on cloud or on-premises is neither the only, nor even a good solution. You will rapidly hit both performance and cost limitations, providing diminishing returns.
So, how do extreme high scale databases keep up? What strategies in both open source and proprietary data processing systems leave linear scaling in the dust, without eating up corporate ROI? In this talk, you’ll learn some of the strategies that provide affordable reverse linear scaling for multiple modern databases, and which direction the future of data processing is going.

January 2024 Austin, Texas, United States

DBTA Data Summit 2023

Mitigating Risks When Moving To The Cloud
Reduce Risk & Avoid Lock-In When Moving From On-Prem to Cloud or Hybrid

May 2023 Boston, Massachusetts, United States

IWCE

Use AIOps to Improve Uptime and Reduce Mean Time to Repair

March 2023 Las Vegas, Nevada, United States

SINC TOLA IT & Security Leaders Forum

Principles for building analytics architectures for performance and growth on a budget

March 2023 Dallas, Texas, United States

Data Day Texas

Principles for building analytics architectures for performance and growth on a budget

January 2023 Austin, Texas, United States

Big Data Europe

On virtual - Shorten the Path to Production with In-Database Machine Learning
Backup session - Provision Datasets for BI and Data Science with a Unified Data Architecture

November 2022 Vilnius, Lithuania

Data Science Camp

Shortcut MLOps with In-Database Machine Learning

August 2022

Data Science Summit

Hybrid conference - Get Projects Into Production Faster with In-Database Machine Learning

June 2022 Warsaw, Poland

DBTA Data Summit

Unifying Analytics - Changing Data Architectures to Bring BI and Data Science Together

May 2022 Boston, Massachusetts, United States

ODSC West

In-Database Machine Learning in Jupyter

November 2021 San Francisco, California, United States

Big Data London

Unifying Analytics - Changing Data Architectures to Bring BI and Data Science Together

September 2021 London, United Kingdom

Paige Roberts

Author, Speaker, Data Nerd

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top