When 7-bit ASCII ain't enough - about NLS, Collation, Charsets, Unicode and such

- Elevator Pitch -

A-Z in upper and lower case isn’t enough in a global world, you need to support all sorts of textual data. Oracle database can handle most of it, but there are pitfalls to be aware of - even using Unicode that should handle everything. Learn about NLS and collation and converting charactersets.

- Description -

How life as a DB developer would be much easier if the entire world was happy with just A-Z in upper and lower case. But the globalized world is a very diverse place with myriads of special characters and modifiers, different alphabets, writing directions, sorting and comparison rules, and many ways to encode these characters. So your database and application absolutely needs to know about these things and handle them correctly, or you’ll find your app displaying garbage or worse, storing garbage.

Oracle database offers many NLS (National Language Support) settings to handle such globalization on database, instance and session/client level. NLS functions are available in the SQL language to handle specific cases with code when the session settings aren’t granular enough. In Oracle 12.2 was added Collation support, where you specify right down on column level how this column should sort and compare values. You have choices whether to store everything with database characterset UTF8 or stick to a single-byte characterset - in both cases be aware of how conversion takes place if the client is not using the database characterset.

This presentation discusses many of the issues involved and ways to use NLS and Collation to avoid them. Pitfalls will be shown, where you can get mangled (or even invalid) data stored if you are not careful.

- Notes -

Topics covered include:

• Character sets and encodings. ASCII, ISO, UTF and more.
• Database character set - what can be used for identifiers.
• National character set - alternative for using Unicode in DB with single-byte DB charset.
• NLS settings of the database versus instance versus session (and client.)
• Plug-compatible charsets in multi-tenant environment.
• Circumstances leading to invalid characters in the database.
• BYTE versus CHAR length semantics.
• Linguistic sorting and matching - NLS_SORT, NLS_COMP.
• Data-bound collation - defaults, column level and PL/SQL units.
• Database Migration Assistant for Unicode (DMU) for changing DB charset.

Presented at:
- OUGN 2018
- ODTUG Kscope 2018
- POUG 2018
- Trivadis TechEvent 2018
- AIOUG online 2020
- DOAG online 2020
- hrOUG 2021

Kim Berg Hansen

Lead Software Developer, Cegal Danmark A/S | #Oracle #SQL #PLSQL #SYM42

Middelfart, Denmark


Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top