Schlomo Schapiro
Agile IT & Open Source Enthusiast
Berlin, Germany
Actions
Schlomo is a DevOps and Open Source expert who likes to bridge business goals with IT. Tech enthusiast reducing systemic complexity. Change agent with deep technical and analytical understanding. Innovating a better Internet for the open knowledge society.
Area of Expertise
Topics
Quality guarantee for contingency plans
When was the last disaster test? How much did it cost and can we really trust our plans now? An contingency plan with guaranteed quality and an SLA would be much better than testing something once a year with a lot of manual effort that might not even work in the event of a major crisis.
What would a quality guarantee or SLA for contingency plans look like? Especially in the context of today's IT landscapes with a typical mix of on-prem and SaaS applications and in view of the lessons learnt from the pandemic, a contingency plan ‘on paper’ is no longer sufficient. Which time-consuming activities can be shifted to the preparation phase and thus developed in a test-driven manner so that the desaster exercise runs automatically every Monday as an integration test at best?
What guarantees for emergencies does the company need in order to fulfil its requirements? How do you take a holistic view from the user to the communication structure and from the desktop and server hardware to the applications? How can you set up an affordable ‘hot standby’ system for office communication and the most important business applications so that the company can continue to function with the most important topics and communicate and work at all in the event of ransomware?
We present the concept of a ‘No Restore Solution’ as a solution and show practical examples of test-driven development for emergency preparedness.
How not to restore anything in an emergency. Because restoring takes a lot of time and may not work when it really matters. Instead, prepared system images that already contain up-to-date data are switched on and used. Thanks to virtualisation, cloud and SaaS solutions, such concepts can now be implemented very cost-effectively and with high quality.
Relax and Recover (ReaR): Automated Linux Recovery & Open Source Project
Relax and Recover (ReaR) (https://relax-and-recover.org) is an Open Source project, that was born in 2006 out of a single customers' need and a consulting project, to offer a cheaper and better alternative to a commercial disaster recovery product for Linux. Since then it has grown by many contributions to cover nearly any disaster recovery situation for Linux servers, desktops and laptops - and it is used in many data centres around the world. Red Hat and SUSE even provide commercial support, with their package maintainers also acting as ReaR maintainers. ReaR is the de-facto standard too for Linux bare metal restore and the only tool that leverages existing backups for disaster recovery purposes.
This talk by the founder and maintainer of ReaR gives an introduction to automating Linux disaster recovery, showing how to install, configure and use ReaR. From basics to advanced subjects around ReaR like design, architecture, development and extending it with custom code.
Finally, we also showcase ReaR as an Open Source project with nearly 20 years of experience and share our success and challenges going forward. We hope to encourage others to also publish solutions as Open Source and are happy to share our experience.
With all this success, we - the Open Source project - still struggle to provide regular releases, test automation or even good architecture documentation.
This talk by the founder and maintainer of ReaR introduces the tool, explores the reasons for these challenges and shows some of the approaches that work, and some that didn't work. I'll be happy for a conversation with other maintainers/projects about how they solve this problem.
Compliant by Default – How DB Systel makes continuous delivery!
Learn about the journey of Deutsche Bahn towards Cloud computing, DevOps and agile transformation, with special focus on our Continuous Delivery strategy and implementation. After a brief overview of what is happening at DB Systel, we will show our Continuous Delivery as-a-Service approach. CDaaS is an integrative approach to Continuous Delivery ensuring governance and security compliance whilst being fully focused on the user experience. We will show the extensibility and simplicity of CDaaS and how it helps DevOps teams improve code quality.
Key take aways are a profound understanding of the intimate relationship between DevOps, Continuous Delivery and Cloud which enables a truly integrated work environment for our developers. By putting *Developer Productivity* first we ensure that our teams can focus on developing their features over choosing the right tool or knowing all platform topics in-depth
Automated Governance - Building a Compliant by Default Environment
How to combine the Kubernetes Resource Model, GitOps and automated compliance certification to create a compliant-by-default work environment for developers.
A highly automated continuous deployment environment creates a whole new world of challenges for companies to meet their compliance and governance requirements. Traditional - manual - processes don’t manage to keep up with quick and frequent releases.
The solution to this conflict of interests is the automation of all compliance checks and the automated certification of every software delivery into production. Sounds obvious and simple, but it is difficult to implement.
The talk shows how we tackle this topic at Deutsche Bahn and how we create solutions for automated compliance certification for cloud platforms and Kubernetes. It starts from the theoretical background, goes over implementation details and concludes with an IT strategy based on this approach.
Lifting the Curse of Static Credentials
Why do we still login with username and password almost everywhere in the age of crypto passports and 50€ hardware tokens?
Static credentials of all kind (passwords, permanent tokens, SSH keys ...) are a major hazard in IT. A lot of engineering effort goes into securely managing secrets. And still companies utterly fail in this area (see "Instagram's Million Dollar Bug").
It is essential to eradicate such static credentials wherever possible. Digital identities, access control lists and trust relationships are the modern tools that make our services secure and our live as engineers easy.
Come and learn from practical examples and specific recommendations for on-premise data centers, desktops and cloud environments that you can instantly use at home and at work. Practical examples include AWS identity integration for Kubernetes or for GitLab CI.
Stealing data from public or shared cloud environments is a raising threat that already put companies out of business. Putting all our assets into public or shared clouds takes away the layer of physical security that is the base of traditional security concepts. One of the root causes for weak security in cloud environments are static credentials.
This talk raises the awareness for this problem and provides proven solutions how to solve it. It lays out a security strategy that significantly reduces the risk of being hacked and that increases the convenience for all users and developers.
See A Login Security Architecture Without Passwords (https://schlomo.schapiro.org/2022/02/login-security-architecture-without-passwords.html) for more background info.
The audience is anyone interested in security and modern IT environments. DevOps and others can learn how to use and setup modern authentication systems with security in mind. Users, administrators and decision makers will learn why eradicating static credentials is one of the most important challenges in modern IT.
This presentation will help everyone better understand the connection between ease of automation, static credentials and general security design. In line with Rugged DevOps and Lean Security these are very important topics that help everybody to build better systems that are secure by their nature and that are impossible to hack because there are no secrets that can be stolen.
Kubernetes and other cloud solutions already provide advanced solutions that allow us to build large environments without static credentials. Concrete examples show the integration of AWS, Kubernetes and on-premise data centers.
Continuous Versioning
Q: How should I version my software? A: Automated!
Every piece of code, config, or other artefact that we deploy somewhere has a version. With Kubernetes, Cloud Native and public clouds continuous delivery is the standard and manually crafting version numbers doesn't scale.
This talk discusses various approaches to automating the definition of version strings for software, configuration and other artefacts. The goal is always to combine a maximum of automation with meaningful version strings that help DevOps to quickly understand what is deployed where.
* What are the requirements for versions?
* Which systems in the build and delivery significantly contribute to a version?
* How to integrate semantic versioning with continuous versioning?
* Examples for simple and complex setups
The most important take away is to automate the generation of version strings as much as possible. In the world of continuous delivery it becomes possible to see versions as a technical number without any attached emotions or special meaning.
Version numbers should be cattle, not pets.
See Meaningful Versions for Continuous Everything (http://blog.schlomo.schapiro.org/2017/08/meaningful-versions-with-continuous.html) for more information.
DevOps is Normal!
DevOps is normal - or isn’t it? Who can claim DevOps normalcy for their company in good conscience?
While most DevOps discussions are focused on the how, Schlomo asks the question when DevOps is normal and what needs to happen before everybody thinks so.
Starting with a new DevOps definition this talk suggests a simple chain of arguments that compares the DevOps transformation with learning how to drive. Same as it is normal nowadays for adults to have a driver’s license, so normal should be DevOps in IT. This analogy is also suited for an elevator pitch to convince management of the should-be normalcy of the DevOps approach.
The DevOps definition allows the audience to win every DevOps argument and thereby turn DevOps into the new normal. Examples illustrate new ways to introduce DevOps into an existing company.
Kubernetes: Shifting the mindset from servers to containers
With Kubernetes pods and containers several fundamental assumptions of server operations don't apply any more. Some Linux services like SSH even disappear and are provided by Kubernetes instead.
This talk explores the mindset shift that developers and admins of Linux servers have to do in order to fully take advantage of the power of a Kubernetes cluster:
* Servers turn into pods
* Linux application services turn into containers
* Standard services like cron and SSH disappear completely
* How to separate between initialization, run and maintenance phases
* Building pods with multiple containers that work together
* Scale out into multiple pods instead of scale up with more CPU power
Following practical examples from real migration projects participants gain a new understanding of the role of services, init scripts, cron jobs and other standard Linux components. Key takeaways are a better understanding of how to model a complex system on top of Kubernetes and practical tips for migrating servers into Kubernetes containers.
Successfully adopting Kubernetes requires a big change in how developers and admins think about servers - bigger than any change before. Bigger than the change brought by VMs. This talk shows why it pays to change traditional concepts and to embrace the new world of Linux services modularization that Kubernetes stands for.
See Using Kubernetes with Multiple Containers for Initialization and Maintenance (http://blog.schlomo.schapiro.org/2017/06/using-kubernetes-with-multiple.html) for more information.
Root for All - Measuring DevOps Adoption
DevOps is about culture and mindset more than about technology - but how do you measure success? How do you know if your company really "does" DevOps?
It turns out that root access to production servers is not only the proverbial holy grail but actually serves as a fact-based measure for the trust and automation levels in an organization.
This talk explores the connection between root access and automation on one hand and DevOps mindsets, cross functional teams and shared responsibility on the other hand. Based on practical experiences at Zalando and at ImmobilienScout24, two major web companies in Berlin, the talk provides concrete suggestions for achieving true DevOps happiness. As a result you will know why in the end there is no harm at all in granting root access to everybody.
Key takeaways are solid arguments that you can use to convince your boss and your peers to take a different approach on root access demonstrating how shared responsibility works for real.
See Root for All - A DevOps Measure? (http://blog.schlomo.schapiro.org/2017/06/root-for-all-devops-measure.html) for more background information.
Solved: SSH Security vs. Automation
Are you still ignoring the WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! messages? Do you use “ssh -o StrictHostkeyChecking=no -o UserKnownHostsFile=/dev/null” to make SSH connections? How can you trust SSH keys in an environment where you install 10 new servers every day? Where a server lives less than a day in average?
Many How-Tos and articles talk about SSH security but fail to put SSH security into the context of managing large data centers or cloud environments with a high degree of automation. This talk covers the ground with SSH security features and shows advanced usage scenarios like:
* How to differentiat between human-machine and machine-machine communication and how to optimize SSH for each
* Best practice for establishing trust relationships between servers or user accounts
* When to use host-based authentication instead of user keys
* When you can us SSHFP to put SSH host key fingerprints into DNS and when it won't work
* Several ways to centrally manage the /etc/ssh/ssh_known_hosts file as suggested by the SSH man page
* Introduction to using the SSH PKI with CA certificates (new feature in OpenSSH 5.4) to simplify host key management in large environment
* When it is better to not use SSH but rsh or other remote execution tools
A special focus are automated environments and different strategies for handling new servers or frequent reinstallations of existing servers.
See also Embedding SSH Key in SSH URL (http://blog.schlomo.schapiro.org/2017/05/embedding-ssh-key-in-ssh-url.html), Automated OpenSSH Configuration Tests (http://blog.schlomo.schapiro.org/2014/04/automated-openssh-configuration-tests.html), and SSH with Personal Environment (http://blog.schlomo.schapiro.org/2014/02/ssh-with-personal-environment.html)
Cloud & Offline Secrets Management
Operational secrets continue to play an important role in production - and are a major security risk and hard to handle. Putting all secrets into a Cloud offerings, key vaults or secrets managers can lock you out in a disaster.
Solving all of these problems, including backup and disaster recovery, is possible via the Open Source tool "SOPS - Secrets OPerationS". This talk shows why SOPS is one of the best tools for this job and provides solutions for online and offline backup and disaster recovery coverage.
Learn how to solve the foundational problems around secret management in modern environments, including access control and disaster recovery for online (cloud) and offline (data centre) backups.
The Role of GitOps in IT Strategy
What is the role of GitOps in IT strategy? This talk gives an overview and puts GitOps into the context of current challenges in IT strategy.
Main aspects are continuous delivery, policy as code, automated governance, compliant-by-default work environments, acceptable means of compliance and a comprehensive automation of all development and operations related processes with the goal of true hands-off operations.
The result places GitOps as a major building block of any modern IT strategy. GitOps helps building essential key IT capabilities. It creates the motivation to truly “fix the basics” via sustainable solutions to enable creating higher level automation solutions. With GitOps engineers can focus much more on business value and spend less effort on boring IT topics.
Why is there no new Release? Nobody pays for the basics :-(
The Relax-and-Recover (ReaR) Open Source project exists since 2006 and is now the de-facto standard for automated Linux Disaster Recovery, shipping with all Linux distros.
A behind-the-scenes look of how such a project works for such a long time, what goes well and what could be better.
ReaR was born out of a single custmers' need and a consulting project, to offer a cheaper and better alternative to a commercial disaster recovery product. Since then it has grown by many contributions to cover nearly any disaster recovery situation for Linux servers and desktops - and it is used in many data centres around the world. Red Hat and SUSE even provide commercial support, with their package maintainers also acting as ReaR maintainers.
With all this success, we still struggle to provide regular releases, test automation or even good architecture documentation.
This talk explores the reasons for that and shows some of the approaches that work, and some that didn't work. I'll be happy for a conversation with other maintainers/projects about how they solve this problem.
Mission Impossible: SaaS Backup & DR
SaaS and Cloud solutions are very practical. But what happens after a major disaster, for example after the vendor lost all my data? Or disables my access after a “disagreement” or simply following a court or undisclosed security service order? In order to not be caught with the pants down we must implement a solid backup & disaster recovery concept.
But many Cloud and SaaS solutions don’t support full data recovery: During “backup” we lose some data and with the “restore” we even lose functionality. Especially with services that assign primary IDs, it is typically impossible to restore the data onto the same original primary ID.
While not on the forefront of typical DevOps stories, backup and disaster recovery are an important part of “day 2” problems and should be also part of the “DevOps contract” within an organisation.
Schlomo introduces the problem space based on real-world examples and shows solution approaches, how we can still truly own our data and maintain our operational ability to act - even after a major disaster.
See also https://schlomo.schapiro.org/2022/04/mission-impossible-complete-google-workspace-disaster-recovery.html for an example of this problem.
2020 All Day DevOps Sessionize Event
microXchg 2018 Sessionize Event
microXchg 2019 Sessionize Event
Schlomo Schapiro
Agile IT & Open Source Enthusiast
Berlin, Germany
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top