Jean Joseph

Principal Data & AI Engineer @Tech-Insight-Group LLC

Newark, New Jersey, United States

Actions

Jean Joseph is a seasoned consultant and senior technical trainer specializing in data engineering and artificial intelligence. With a strong background in database design, administration, and cutting-edge data technologies including machine learning and generative AI.

He helps organizations build secure, scalable solutions across both legacy systems and modern cloud platforms. Formerly recognized as a Microsoft MVP and senior technical trainer at Microsoft, Jean brings deep technical insight and a passion for teaching.

He’s also a dynamic speaker, mentor, and the founder of the Cloud Data Driven User Group and the Future Data Driven Summit, where he champions innovation and promotes responsible use of emerging tech within the data community.

Badges

Area of Expertise

Information & Communications Technology

Topics

Database
Big Data
Analytics and Big Data
Data Science
Azure Data Platform
Azure SQL Database
Data Management
Microsoft Data Platform
Azure Data Factory
Data Warehousing
Database Administration
data engineering
Data Platform
Azure Data Lake
Databases
All things data
Machine Learning and Artificial Intelligence
Microsoft (Azure) AI + Machine Learning
Agentic AI
Agentic AI architecture
Agentic AI / Autonomous Agents
Generative & Agentic AI

Natural Language Interface to Databases: A New Paradigm for Data-Driven Development

Database development is a complex and tedious process that requires extensive knowledge of database languages and schemas. However, with the advances in natural language processing and generative AI, a new paradigm is emerging: natural language interface to databases (NLIDB).

NLIDB allows users to interact with databases using natural language, translating business requirements into code and queries automatically.

In this session, you will learn how NLIDB works, what are the benefits and challenges of this approach, and how you can leverage it to streamline your database development lifecycle.

You will also see a live demonstration of a NLIDB system that can generate database codes like schemas, queries, store procedures, functions, views, tables and bulk load from natural language prompts.

Join us to discover how natural language processing can unlock the potential of generative AI for data-driven development, where database codes are prompts.

Unraveling the Mysteries: A Comprehensive Study of ChatGPT and Its Transformer Backbone

Curious about how GPT, a variant of the Generative Pre-training Transformer model, generates human-like text? Join us for an insightful presentation that demystifies GPT's inner workings.

We'll explore the fundamentals of GPT, a transformer-based neural network architecture trained on vast text datasets and fine-tuned for conversational data. Learn how GPT processes prompts using its sophisticated transformer mechanism.

Dive into the specifics of GPT's vocabulary, including its size and the corpus it draws from. Understand the critical tokenization process that prepares input for the model.

The core of our presentation will focus on GPT's encoder-decoder structure, showcasing how it leverages transformer models to generate responses. We'll also cover the backpropagation algorithm, essential for training the model.

Additionally, we will emphasize the considerations necessary to improve the model's accuracy, ensuring reliable and precise text generation.

Finally, discover how GPT maintains dialogue through feedback and conversation loops, and the role of the Retrieval-Augmented Generation (RAG) model in enhancing its capabilities.

By the end of this session, you'll gain a comprehensive understanding of GPT, offering valuable insights for those interested in natural language processing and AI development.

learning objectives for the presentation:

Understand the fundamentals of GPT, including its transformer-based neural network architecture and how it generates human-like text.
Explore the specifics of GPT's vocabulary, tokenization process, and the encoder-decoder structure that powers its response generation.
Learn about the feedback and conversation loops in GPT, and the role of the Retrieval-Augmented Generation (RAG) model in enhancing its conversational capabilities.

Unlocking AI Potential with Prompt Engineering

"Unlocking AI Potential with Prompt Engineering" is a comprehensive exploration of prompt engineering in AI. This session delves into the impact of well-designed prompts on AI model performance, especially in natural language processing. Attendees will gain insights into prompt design, optimization strategies, and their influence on model interpretability and fairness.

The session will also shed light on various types of prompts like zero-shot, one-shot, few-shot, COT, and RAG, and their effect on model performance. It will provide a detailed look into these techniques, including input format selection, task description, and examples. Ideal for AI practitioners and data scientists, this session offers a unique perspective on maximizing AI effectiveness.

Rapid and Scalable ML with Azure ML Automated ML Model

Want to save time on training your machine learning models? Join my session to discover how Azure ML Automated ML empowers data scientists, analysts, and developers to build high-quality machine learning models with minimal effort and time.

This powerful tool automates the tedious and iterative tasks of model development, including data preprocessing, feature engineering, algorithm selection, hyperparameter tuning, and model evaluation. It supports a variety of machine learning tasks such as classification, regression, forecasting, computer vision, and natural language processing.

In this presentation, we will demonstrate how to use Azure ML Automated ML to create and deploy machine learning models for various scenarios, utilizing both the no-code UI and the Python SDK.

We will discuss the benefits and challenges of using Automated ML, along with best practices and tips for achieving optimal results. Finally, we will show how to interpret and explain the models generated by Automated ML using built-in responsible machine learning solutions.

learning objectives for the session:

Understand how Azure ML Automated ML streamlines the machine learning model development process, including data preprocessing, feature engineering, algorithm selection, hyperparameter tuning, and model evaluation.
Learn to create and deploy machine learning models for various scenarios using both the no-code UI and the Python SDK in Azure ML Automated ML.
Explore best practices for achieving optimal results with Automated ML, and how to interpret and explain models using built-in responsible machine learning solutions.

End to End SQL Database Development: A Comprehensive guide with Interactive Prompts

"Interactive T-SQL Database Development with ChatGPT: A Prompt-Based Approach" is a technical deep-dive into leveraging the power of ChatGPT for T-SQL database development. This session will explore how to use ChatGPT's prompt-based interaction model to streamline the process of writing, testing, and optimizing T-SQL queries.

Attendees will learn how to leverage ChatGPT and Python to generate complex T-SQL scripts, automate database tasks, and troubleshoot common T-SQL issues using natural language. The session will also cover best practices for integrating ChatGPT into your existing database development workflow.

This presentation is ideal for database developers, data analysts, and anyone interested in harnessing the power of AI for database development.

Exploring Sentimental Analysis using Azure AI Search for your Search Documents

In today's data-driven business landscape, organizations are constantly challenged with extracting meaningful insights from vast amounts of data. Sentiment analysis, a key aspect of natural language processing, has emerged as a powerful tool to understand customer sentiment and behavior. However, the process of implementing sentiment analysis can be complex and time-consuming, requiring expertise in machine learning and natural language processing.

Azure AI Search addresses these challenges by offering custom skills, including the Sentiment cognitive skill, which simplifies the implementation of sentiment analysis. This cloud-based service allows businesses to easily build rich search experiences over their data, using AI-powered skills to enrich content with knowledge and insights. By leveraging Azure AI Search, businesses can automate the process of sentiment analysis, enabling them to make data-driven decisions more efficiently and effectively. This not only saves time and resources but also enhances the user experience by providing more relevant and personalized search results.

Attendees will learn how to leverage Azure AI Search’s powerful capabilities to perform sentiment analysis on large volumes of text data, interpret the results, and use these insights to make data-driven decisions. The session will also include practical demonstrations and use-cases, providing attendees with a hands-on understanding of the topic. This presentation is ideal for data scientists, AI practitioners, and anyone interested in enhancing their search documents with sentiment analysis.

Exploring Azure Cosmos DB for PostgreSQL

I will introduce the new Azure Cosmos DB for PostgreSQL service. You'll learn the benefits of distributed Postgres including geo-replication across the Azure cloud. Learn about scaling smart to support your growth and meet your performance needs.

In this session I will cover
- Distributing tables in Azure Cosmos DB for PostgreSQL
- How to do Data Modeling for Distributed Postgres in Azure Cosmos DB
- Creating tables in Azure Cosmos DB for PostgreSQL
- How to Load data into Azure Cosmos DB for PostgreSQL
- How to querying & update tables in Azure Cosmos DB for PostgreSQL

Data Labeling in Azure Machine Learning: A Comprehensive Guide for Image and Text Data

Data labeling is a crucial step in any machine learning project, as it provides the ground truth for training and evaluating models. However, data labeling can also be a tedious, time-consuming, and error-prone task, especially for large and complex datasets. To address this challenge, Azure Machine Learning offers a data labeling tool that enables you to create, manage, and monitor data labeling projects with ease and efficiency.

In this presentation, you will learn how to:
- Use the data labeling tool in Azure Machine Learning to label image and text data for various machine learning tasks, such as classification, object detection, instance segmentation, semantic segmentation, and named entity recognition.
- Leverage machine learning-assisted data labeling and human-in-the-loop labeling to accelerate and improve the quality of your labeling process.
Coordinate data, labels, and team members to efficiently manage labeling tasks and track progress.
- Review and export the labeled data as an Azure Machine Learning dataset for further analysis and modeling.
- Integrate the data labeling tool with other Azure Machine Learning services, such as MLflow, AutoML, and pipelines, to streamline and automate your machine learning lifecycle.

Join this session to discover how data labeling in Azure Machine Learning can help you prepare high-quality data for your machine learning projects.

The role of the DBA in NoSQL

What is the role of the Database Administrator (DBA) in the rapidly evolving world of NoSQL? A majority of the early NoSQL adoption is in the fast-growing world of small and medium companies based on public clouds. In most of these companies, the DBA role does not exist which has led a lot of people to proclaim the end of the DBA.

Join my session to examine a few trends in the marketplace that are going to have a great downstream impact on the technology workplace.

Microsoft Fabric Spark SQL Tutorial: An Introductory Guide for Beginners

"Microsoft Fabric Spark SQL Tutorial: An Introductory Guide for Beginners" is a detailed presentation designed to introduce beginners to the world of Spark SQL using Microsoft Fabric. This tutorial will guide you through the basics of Spark SQL, its integration with Microsoft Fabric, and how to use it to perform data analysis tasks.

By leveraging the scalability of Spark and the robustness of Microsoft Fabric, users can handle large datasets efficiently and extract valuable insights. This presentation is perfect for those new to Spark SQL or those looking to enhance their data processing capabilities with Microsoft Fabric. No prior experience with Spark SQL or Microsoft Fabric is required, making this an ideal starting point for beginners in the field.

Join us as we delve into the exciting world of data analysis with Spark SQL and Microsoft Fabric.

learning objectives for the tutorial:

Understand the basics of Spark SQL and its integration with Microsoft Fabric for efficient data analysis.
Learn how to leverage the scalability of Spark and the robustness of Microsoft Fabric to handle large datasets and extract valuable insights.
Gain practical experience in performing data analysis tasks using Spark SQL within Microsoft Fabric, tailored for beginners with no prior experience.

Mastering Synapse Serverless SQL Pool

Are you looking for a way to do basic data exploration, transformation against your Data Lake files without the need of provisioning Apache Spark Pool? If yes then please join my session to learn how you can leverage Azure Synapse Serverless SQL Pool to explore and transform your files that are in your Data Lake using T-SQL moreover create a Logical Data Warehouse against your data lake files

Azure Synapse Serverless SQL Pool facilitates data exploration, transformations and data warehousing with multiple functionalities, allowing us to work with it using SQL. This session discusses how it works, what we can do, cost saving and demos on use cases. This is the content:

- Introduction to Serverless SQL Pool
- How useful it is
- Is it same as Dedicated SQL Pool?
- Demo
- Data exploration
- Data transformation
- Build a logical data warehouse and accessing it using Power BI
- Build a Lake Database

Data Processing Architecture: Key Design Principles & Considerations

In the era of big data, the design of data processing architecture is crucial for efficient data management and analysis. This presentation explores the fundamental principles and considerations essential for constructing robust data processing systems. Key design principles such as scalability, reliability, security, and flexibility are examined in detail.

The architecture's ability to handle varying data flows, ensure data integrity, and maintain security across multiple stages is emphasized. Additionally, the presentation discusses various architectural patterns, including data warehouses, data lakes, and data flow pipelines, highlighting their respective use cases and benefits.

Furthermore, the presentation contrasts traditional data processing architecture with the emerging concept of data mesh. While traditional architectures focus on centralized data processing and transformation, data mesh advocates for a decentralized approach, promoting domain-oriented data ownership and self-serve data infrastructure.

This comparison underscores the shift from monolithic data management to a more flexible and scalable architecture, addressing the diverse needs of modern data-driven organizations.

By adhering to these principles and considerations, data engineers can create systems that not only meet current data processing needs but are also adaptable to future technological advancements and data requirements.

learning objectives for the presentation:

Understand the fundamental principles of designing robust data processing architectures, focusing on scalability, reliability, security, and flexibility.
Explore various architectural patterns, including data warehouses, data lakes, and data flow pipelines, and their respective use cases and benefits.
Compare traditional data processing architectures with the emerging concept of data mesh, highlighting the shift towards decentralized, domain-oriented data ownership and self-serve data infrastructure.

Advanced SQL For Data Scientists

I will begin with a brief overview of SQL. Then the five major topics a data scientist should understand when working with relational databases: basic statistics in SQL, data preparation in SQL, advanced filtering and data aggregation, window functions, and preparing data for use with analytics tools.

Running Microsoft SQL Server on Amazon RDS.

Are you looking for a way to deploy, operate, scale and monitor highly available Microsoft SQL server workloads on Amazon RDS? Then come and learn all the best practices and considerations when deploying and operating running Microsoft SQL Server on Amazon RDS.

Learning Objectives:
*How to deploy, operate, and scale a SQL Server database in minutes with cost-efficient and resizable hardware capacity.
*Learn all kind of best practices when deploying and maintaining Microsoft SQL server workloads on Amazon RDS.
*How to use a client application to connect to RDS for SQL Server to store and query data.
*How to configure RDS features such as high-availability, backup, monitoring and security.

Python for SQL Server dba & Developers

Nowadays, SQL Server Database Administrators and Developers need more expertise than SQL or PowerShell

You will learn the fundamentals of Python and how to leverage Python to retrieve and manipulate data from SQL Server. How Python can be used as an ETL and administration tool.

Diversity, Equity and Inclusion Panel -- Navigating the Storm

The world is changing fast around us. We need to learn to adapt as part of that we need to more pay attention to diversity and inclusion in our workplaces and our community. The road isn’t smooth that leads to equality but we all can play and important role in getting the road smoother. In this panel discussion, we will discuss topics that will help you navigate your way through issues at work and in the data community around diversity and inclusion and learn the struggles of your peers and how to be of help.

Cleaning and Transforming Data with SQL

One of the first tasks performed when doing data analytics or sciences is to clean the dataset you’re working with. The insights you draw from your data are only as good as the data itself, so it’s no surprise that an estimated 80% of the time spent by analytics professionals involves preparing data for use in analysis.

You’ll learn techniques on how to clean messy data in SQL, which is a must-have skill for any Data Analyst or Scientist moreover I will discuss and demo different functions commonly used to clean, transform, and remove duplicate data from query outputs that may not be in the form we would like.

Analyzing SQL Server Query Plans

Query performance troubleshooting requires significant expertise in understanding query processing and execution plans, in order to be able to actually find and fix root causes.

you will learn how to read the SQL Execution Plan correctly, Analyzing SQL Server Query Plans, the ability to identify performance bottlenecks on your database and explore how to resolve the performance bottleneck.

From Housekeeping to Data Engineer - My journey to find my passion

Join me to learn about how I moved to the US from Haiti and worked my way up from a simple housekeeping job to becoming a data engineer.

Getting Started With Azure Synapse

In this presentation, I will introduce Azure Synapse architecture, its components, and features to all stages of data implementation and processing moreover understanding some best practices and pitfalls. Will explain in details the three methods for distributing data (round-robin (default), hash and replicated). From ingesting to data lakes to transform data in big data services to apply machine learning models, including data remodeling. Demo a full implementation of Azure Synapse all the way to presentation and reporting.

Identify SQL Server databases performance issues

This session is for those who want to know how to Quickly Pinpoint SQL Server Databases Performance Issues. I will explain and demo how SQL Server poor configuration, open transaction, blocking, statistics, Disk IO, insufficient memory, CPU and too few or too many indexes affect performance including bad T-SQL scripts.
After this session you should be able to quickly identify the top performance issues of your databases.

Deploying SQL Server Docker Container

I will cover basic introduction to SQL Server Docker Container. How to deploy SQL Server docker container using stand alone scripts, dockerfile and docker compose file moreover a bit of way to persist data in SQL Server Docker container also disaster recovery.

NoSQL For The DBAs

Businesses are quickly moving to NoSQL databases to power their modern applications. Reason is that they are looking for a way to host a relatively unmodified RDBMS schema on a NoSQL database, then optimize it over time.

This presentation will show DBAs and SQL developers how to achieve the benefits of NoSQL within their environments. You’ll learn how to migrate a table-based data model to JSON documents, tweak your queries for relational JSON data, and create indexes to support fast query performance moreover the risk involved when using NoSQL.

Emotional Intelligence at Work: An Introduction

No one really cares how smart or witty you are if you can't get along with people. What is more important than cultivating & maintaining significant relationships?

In today’s dynamic workplace, emotional intelligence (EI) is a crucial skill that can significantly enhance personal and professional success. This session will introduce the core concepts of emotional intelligence and its importance in the workforce. Participants will learn how to recognize and manage their own emotions, understand and influence the emotions of others, and build stronger, more collaborative relationships.

We will explore practical techniques for developing EI, including self-awareness, self-regulation, motivation, empathy, and social skills. By the end of this session, attendees will have a deeper understanding of how to apply emotional intelligence to improve communication, leadership, and teamwork in their professional lives.

Join my session to learn The five domains of emotional intelligence that will help you to recognize and understand your own emotions, manage your own emotions, manage your own motivation, recognize emotions in others and effectively manage others’ emotions.

What is a Data Pipeline? Architecture and Best Practices

Building a Data Pipeline Architecture Based on Best Practices Brings the Biggest Rewards. As a data-pipeline developer, you should consider the architecture of your pipelines so they are nimble to future needs and easy to evaluate when there are issues.

Join my session to learn all about The best strategies for how to build and manage a robust data pipeline that allows you to rapidly integrate new datasets into your petabyte-scale data store.

Building End-To-End Modern Data Warehouse with Azure Synapse Analytics

you are looking for a comprehensive and unified platform to build modern data warehouse from end to end. If yes then join my session to learn how to Implement end-to-end analytics solutions using Azure Synapse SQL and Spark pool

How to Make Your Machine Learning Models More Interpretable and Explainable

Abstract: Machine learning models are often seen as black boxes that produce predictions without revealing the underlying logic or reasoning. This can pose challenges for trust, accountability, fairness, and debugging, especially in high-stakes domains such as healthcare, finance, or security.

In this presentation, we will introduce the concepts of interpretability and explainability in machine learning, and discuss why they are important for both developers and users of machine learning systems. We will also review some of the techniques and tools that can help make machine learning models more interpretable and explainable, such as feature importance, partial dependence plots, LIME, SHAP, and counterfactual explanations.

We will demonstrate how to apply these techniques and tools to different types of models, such as linear models, tree-based models, and deep neural networks, using examples from real-world applications. Finally, we will highlight some of the challenges and limitations of existing methods, and suggest some directions for future research and practice.

Jean Joseph

Principal Data & AI Engineer @Tech-Insight-Group LLC

Newark, New Jersey, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Most Active Speaker

Jean Joseph

Actions

Links

Badges

Area of Expertise

Topics

Sessions

Natural Language Interface to Databases: A New Paradigm for Data-Driven Development

Unraveling the Mysteries: A Comprehensive Study of ChatGPT and Its Transformer Backbone

Unlocking AI Potential with Prompt Engineering

Rapid and Scalable ML with Azure ML Automated ML Model

End to End SQL Database Development: A Comprehensive guide with Interactive Prompts

Exploring Sentimental Analysis using Azure AI Search for your Search Documents

Exploring Azure Cosmos DB for PostgreSQL

Data Labeling in Azure Machine Learning: A Comprehensive Guide for Image and Text Data

The role of the DBA in NoSQL

Microsoft Fabric Spark SQL Tutorial: An Introductory Guide for Beginners

Mastering Synapse Serverless SQL Pool

Data Processing Architecture: Key Design Principles & Considerations

Advanced SQL For Data Scientists

Running Microsoft SQL Server on Amazon RDS.

Python for SQL Server dba & Developers

Diversity, Equity and Inclusion Panel -- Navigating the Storm

Cleaning and Transforming Data with SQL

Analyzing SQL Server Query Plans

From Housekeeping to Data Engineer - My journey to find my passion

Getting Started With Azure Synapse

Identify SQL Server databases performance issues

Deploying SQL Server Docker Container

NoSQL For The DBAs

Emotional Intelligence at Work: An Introduction

What is a Data Pipeline? Architecture and Best Practices

Building End-To-End Modern Data Warehouse with Azure Synapse Analytics

How to Make Your Machine Learning Models More Interpretable and Explainable

Jean Joseph

Links

Actions