Paige Roberts
Author, Speaker, Data Nerd
Actions
Paige Roberts has worked as an engineer, trainer, support technician, technical writer, marketer, product manager, and a consultant in the last 28 years. She's built data engineering pipelines and architectures, documented and tested open source analytics implementations, worked with different industries, and questioned a lot of assumptions. She's worked for companies like Pervasive, the Bloor Group, Actian, Hortonworks, Syncsort, Vertica, and now, GridGain. An IASA certified enterprise data architect, she contributed to "97 Things Every Data Engineer Should Know," and co-authored "Accelerate Machine Learning with a Unified Analytics Architecture" and "Up and Running with Aerospike" from O'Reilly publishing. Thinkers 360 lists her as a top 10 thought leader in Data Architecture, and top 25 in both Analytics and Big Data. She promotes understanding of high scale data engineering, real-time distributed data processing, and how the analytics revolution is changing the world.
Area of Expertise
Topics
Data Connect 2024
Streaming Graph Processing on Categorical Data Enables Real-time Risk Calculation
Failures like the Silicon Valley Bank in 2023 is the extreme result of not accurately calculating risk in a timely manner. Nearly every financial institution has a focus on minimizing risk, but the way we calculate that inherently requires close analysis of categorical data and relationships. Yet the majority of our algorithms only work on static, numeric data. That means persisting the data, converting it using something like one hot encoding into numerical data that is bloated, sparse, and slow to analyze, then after analysis, often having to convert again to figure out the original categories. This is painfully slow, with the state of the art being measured in hours. If we could shift that analysis left, process the original categorical data as it streams in, without modification, that could cut mean time to insight down to seconds, and possibly save financial institutions some large dollar signs. That could also enable many other options, such as using graph NLP on flowing data, finding novel behavior, detecting anomalies such as cyber-attacks before they affect systems. The speed of an in-line data processing engine like Flink or KsqlDb combined with graph algorithms and categorical analysis is uniquely powerful. Come learn about a new open source streaming intelligence system that changes the game for risk analysis and other fast categorical data processing.
DBTA Data Summit 2024
Get Better Analytics by Putting Less Data in Your Database
A recent survey showed that 67% of companies had their software budgets cut during 2023. SaaS databases are easy to use and powerful, but they put a strain on budgets. Still, no one can afford to skimp on smart data analytics. How do you get more analytics out of your SaaS data warehouse/lakehouse, without spending more money? Treat incoming data streams as a graph. Relationships and categories of data can immediately be seen and acted upon. Duplicate entities can be resolved. Key pattern signals in noisy data streams can be pinpointed and the noise that you don’t need tossed out. By putting only relevant and clean data into analytical repositories, tons of useless data never have to be stored in pay-per-use systems, vastly reducing costs. You get smarter answers on clean, pre-filtered data in real time.
Data Day Texas 2024
When linear scaling is too slow – strategies for high scale data processing
How does the TradeDesk handle 10 million ad auctions per second and generate 40 thousand reports in less than 6 hours on 15 petabytes of data? If you want to crunch all the data to train an LLM AI model, or handle real-time machine scale IoT problems for AIOps, or juggle millions of transactions per second, linear scaling is far too slow.
Is the answer a 1000-node database with a ton of memory on every node? If it was, companies like the TradeDesk would have to declare bankruptcy. Throwing more nodes or serverless executors at the problem either on cloud or on-premises is neither the only, nor even a good solution. You will rapidly hit both performance and cost limitations, providing diminishing returns.
So, how do extreme high scale databases keep up? What strategies in both open source and proprietary data processing systems leave linear scaling in the dust, without eating up corporate ROI? In this talk, you’ll learn some of the strategies that provide affordable reverse linear scaling for multiple modern databases, and which direction the future of data processing is going.
DBTA Data Summit 2023
Mitigating Risks When Moving To The Cloud
Reduce Risk & Avoid Lock-In When Moving From On-Prem to Cloud or Hybrid
IWCE
Use AIOps to Improve Uptime and Reduce Mean Time to Repair
SINC TOLA IT & Security Leaders Forum
Principles for building analytics architectures for performance and growth on a budget
Data Day Texas
Principles for building analytics architectures for performance and growth on a budget
Big Data Europe
On virtual - Shorten the Path to Production with In-Database Machine Learning
Backup session - Provision Datasets for BI and Data Science with a Unified Data Architecture
Data Science Camp
Shortcut MLOps with In-Database Machine Learning
Data Science Summit
Hybrid conference - Get Projects Into Production Faster with In-Database Machine Learning
DBTA Data Summit
Unifying Analytics - Changing Data Architectures to Bring BI and Data Science Together
ODSC West
In-Database Machine Learning in Jupyter
Big Data London
Unifying Analytics - Changing Data Architectures to Bring BI and Data Science Together
Paige Roberts
Author, Speaker, Data Nerd
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top