Session

Big data from the trenches: Demystifying the buzzword

The term big data might sound old-fashioned now in the age of deep learning and thinking machines. Guess what, it's more important even than before.

Have you ever faced the dilemma of nvarchr(4000) vs nvarchar(max)? Have you ever had two datasets/arrays and were asked to see if they are similar or not? Can you tell at least one anecdote about an automated process done using bash/PowerShell script where some parts failed silently and how hard it was to troubleshoot them and dig inside tons of log files?

These problems also happen in big data applications in one form or another and their scale and complexity is even much worse. Taking the learning and experience from real world big data applications, you will learn how such challenges are managed in big data world.

You will learn about file formats used to support storage of petabytes of data while also allowing quick search. You will see how statistics and data visualisation are used to inspect inputs or confirm conclusions about final outputs. You will appreciate the value of orchestration and monitoring tools used to manage sophisticated processes that span days or even weeks. You will learn how tools like Spark are designed to recover from failures in such distributed environments built on commodity hardware.

By the end of this session, you will have a high level understanding of how big data applications are developed and operated and where to start your big data journey with tools like Hadoop, Hive and Spark.

Yousry Mohamed

Lead Consultant at Cuusoo

Brisbane, Australia

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top