Speaker

Brian Wylie

Brian Wylie

MegaCow at SuperCowPowers LLC

Albuquerque, New Mexico, United States

Actions

Brian Wylie is the founder of SuperCowPowers LLC, a software development and consulting firm that specializes in the AWS Sagemaker ecosystem of machine learning services. With a master’s degree in computer science and data engineering positions at Kitware, Vectra Networks, Mandiant, and Sandia National Labs, Brian’s background includes a broad set of experiences in both real world applications and R&D activities. Brian has a passion for open-source projects (https://github.com/SuperCowPowers) and it totally up for collaborations on AWS + ML + Awesome. Random facts, Brian’s Erdös number is 3, he likes hiking, video games, and thinks cows are super.

Area of Expertise

  • Information & Communications Technology

Topics

  • AWS Architect
  • Data Science
  • Data Engineering
  • Data Visualization

Simplifying Exploratory Data Analysis of NSM Logs

Network Security Monitoring (NSM) is a tried and true practice, enabling defenders to identify, understand, and respond to threats on their network. Most often broken down into collection, detection, and analysis phases, open-source tooling tends to focus on the collection and detection side, leaving some defenders wondering where to start when it comes to analysis. The sheer volume and complexity of network data often leads to data quality issues, unexpected values, outliers and anomalies that can take a significant amount of time to process. Our presentation will utilize the open source SageWorks toolkit to perform data quality and Exploratory Data Analysis (EDA) on NSM data. The toolkit provides an intuitive Python API and a set of Dash/Plotly web interfaces that enable data processing and visualizations of data sources, feature sets, and various data quality metrics.

Starting with an overview of the toolkit architecture, we will explore the range of Python classes, data processing, and visualizations it provides. We will demonstrate how its integration with AWS combines the functionality, reliability, and security of AWS with the simplicity of an easy to use Python API.

Specifically, we’ll cover a set of Exploratory Data Analysis (EDA) techniques and visualizations:
- Data Upload (S3 buckets/nightly processing)
- Athena Query Interfaces (SQL)
- Data Samples, Column Types, NaN counts
- Quartile computations and distribution plots
- Outliers/Anomalies
- Feature Store/Sets

Lastly, we will reflect on the future developments of Sageworks, emphasizing how its approach of leveraging AWS services provides an extensible set of modeling and data analysis functionality using existing AWS and third party offerings.

Resources
- SageWorks Github: https://github.com/SuperCowPowers/sageworks.
- SageWorks ML Pipeline: https://nbviewer.org/github/SuperCowPowers/sageworks/blob/main/notebooks/ML_Pipeline_with_SageWorks.ipynb

BSides Albuquerque Sessionize Event

September 2023 Albuquerque, New Mexico, United States

Brian Wylie

MegaCow at SuperCowPowers LLC

Albuquerque, New Mexico, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top