Paige Roberts

Author, Speaker, Data Nerd

Actions

Paige Roberts has worked as an engineer, trainer, support technician, technical writer, marketer, product manager, and a consultant in the last 28 years. She's built data engineering pipelines and architectures, documented and tested open source analytics implementations, worked with different industries, and questioned a lot of assumptions. She's worked for companies like Pervasive, the Bloor Group, Actian, Hortonworks, Syncsort, Vertica, and now, GridGain. An IASA certified enterprise data architect, she contributed to "97 Things Every Data Engineer Should Know," and co-authored "Accelerate Machine Learning with a Unified Analytics Architecture" and "Up and Running with Aerospike" from O'Reilly publishing. Thinkers 360 lists her as a top 10 thought leader in Data Architecture, and top 25 in both Analytics and Big Data. She promotes understanding of high scale data engineering, real-time distributed data processing, and how the analytics revolution is changing the world.

Area of Expertise

Information & Communications Technology

Topics

Enterprise Architecture
Solution Architecture
Data Architecture
information architecture
Analytics
BI & Analytics
Big Data Analytics
Big Data
Streaming Data Analytics
Software Analytics
Database and Analytics
Data analysis
Cloud analytics
analytics engineering
Cloud Architecture
Analytics and Big Data
Data Management
big data clusters
Data Analytics
Big Data Machine Learning AI and Analytics
Machine Learning and Artificial Intelligence
Machine Learning & AI
AI & Machine Learning
Machine Learning Engineering
Machine Learning

Data Connect 2024

Streaming Graph Processing on Categorical Data Enables Real-time Risk Calculation

Failures like the Silicon Valley Bank in 2023 is the extreme result of not accurately calculating risk in a timely manner. Nearly every financial institution has a focus on minimizing risk, but the way we calculate that inherently requires close analysis of categorical data and relationships. Yet the majority of our algorithms only work on static, numeric data. That means persisting the data, converting it using something like one hot encoding into numerical data that is bloated, sparse, and slow to analyze, then after analysis, often having to convert again to figure out the original categories. This is painfully slow, with the state of the art being measured in hours. If we could shift that analysis left, process the original categorical data as it streams in, without modification, that could cut mean time to insight down to seconds, and possibly save financial institutions some large dollar signs. That could also enable many other options, such as using graph NLP on flowing data, finding novel behavior, detecting anomalies such as cyber-attacks before they affect systems. The speed of an in-line data processing engine like Flink or KsqlDb combined with graph algorithms and categorical analysis is uniquely powerful. Come learn about a new open source streaming intelligence system that changes the game for risk analysis and other fast categorical data processing.

July 2024 Columbus, Ohio, United States

DBTA Data Summit 2024

Get Better Analytics by Putting Less Data in Your Database

A recent survey showed that 67% of companies had their software budgets cut during 2023. SaaS databases are easy to use and powerful, but they put a strain on budgets. Still, no one can afford to skimp on smart data analytics. How do you get more analytics out of your SaaS data warehouse/lakehouse, without spending more money? Treat incoming data streams as a graph. Relationships and categories of data can immediately be seen and acted upon. Duplicate entities can be resolved. Key pattern signals in noisy data streams can be pinpointed and the noise that you don’t need tossed out. By putting only relevant and clean data into analytical repositories, tons of useless data never have to be stored in pay-per-use systems, vastly reducing costs. You get smarter answers on clean, pre-filtered data in real time.

May 2024 Boston, Massachusetts, United States

Data Day Texas 2024

When linear scaling is too slow – strategies for high scale data processing

How does the TradeDesk handle 10 million ad auctions per second and generate 40 thousand reports in less than 6 hours on 15 petabytes of data? If you want to crunch all the data to train an LLM AI model, or handle real-time machine scale IoT problems for AIOps, or juggle millions of transactions per second, linear scaling is far too slow.
Is the answer a 1000-node database with a ton of memory on every node? If it was, companies like the TradeDesk would have to declare bankruptcy. Throwing more nodes or serverless executors at the problem either on cloud or on-premises is neither the only, nor even a good solution. You will rapidly hit both performance and cost limitations, providing diminishing returns.
So, how do extreme high scale databases keep up? What strategies in both open source and proprietary data processing systems leave linear scaling in the dust, without eating up corporate ROI? In this talk, you’ll learn some of the strategies that provide affordable reverse linear scaling for multiple modern databases, and which direction the future of data processing is going.

January 2024 Austin, Texas, United States

Paige Roberts

Author, Speaker, Data Nerd

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Paige Roberts

Actions

Links

Area of Expertise

Topics

Events

Data Connect 2024

DBTA Data Summit 2024

Data Day Texas 2024

DBTA Data Summit 2023

IWCE

SINC TOLA IT & Security Leaders Forum

Data Day Texas

Big Data Europe

Data Science Camp

Data Science Summit

DBTA Data Summit

ODSC West

Big Data London

Paige Roberts

Links

Actions