Session

External Tables - Not *Just* Loading a CSV File

- Elevator Pitch -

With external tables you can SELECT from files - great for loading plain CSV files, but it can do so much more than that. Learn about loading from text files, DataPump dump files, HDFS and HIVE, as well as how to add constraints and partitioning for optimizer optimizations of the external tables.

- Description -

External tables are a way to map data from files in the OS to columns in a rowsource - in effect allowing you to SELECT directly from a file. This can be useful in many ways allowing you to for example join the external data to lookup tables in the database, as well as use the complete range of SQL syntax and functions to transform the external data if needed.

From version to version, Oracle database has added functionality to the external table, so it now is capable of so much more than simply loading a CSV file. You can load data from flat text files, from binary DataPump dump files, from HDFS (Hadoop Distributed File System) or from Apache HIVE. You can even do a limited unloading to a DataPump dump file. You can have the loading process perform an OS preprocess call, like for example unzip to allow you to load from a compressed file.

Oracle 12.2 also adds the possibility of adding certain metadata to the external table useful for the optimizer. You can define constraints (primary key, referential, unique, not null) for the optimizer to trust and rely on, and you can define partitions to allow partition pruning to read only the necessary files of a multi-file external table.

This presentation discusses the various possibilities and shows examples of syntax and use cases for the various access drivers - ORACLE_LOADER, ORACLE_DATAPUMP, ORACLE_HDFS and ORACLE_HIVE.

- Notes -

Topics covered include:

• Access drivers and parameters: ORACLE_LOADER, ORACLE_DATAPUMP, ORACLE_HDFS and ORACLE_HIVE.
• Location: directories and files, HDFS URIs.
• Datatype conversion, characterset handling, delimited and fixed-length data.
• Error handling, logging, bad file.
• SQL*Loader comparison and use as code generator.
• XMLTAG not parsing but text tag search.
• PREPROCESSOR to manipulate or generate datafile.
• Multiple files, parallelism and partition pruning.
• Trusted, relied constraints on external tables.
• CSV and ALL FIELDS OVERRIDE.
• ORACLE_DATAPUMP reading dump files.
• ORACLE_DATAPUMP writing dump files with CTAS.
• HDFS/HIVE parameters for column mappings, overflow handling and more.

Presented at:
- ODTUG Kscope 2018
- Trivadis TechEvent 2018
- DOAG 2019
- NoCOUG online 2020
- RoOUG online 2021
- nlOUG 2022
- OUGN 2022

Kim Berg Hansen

Lead Software Developer, Cegal Danmark A/S | #Oracle #SQL #PLSQL #SYM42

Middelfart, Denmark

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top