Ricardo Castro

Ricardo Castro

Senior Site Reliability Engineer @ DefinedCrowd

Senior Site Reliability Engineer at DefinedCrowd. MSc in Computer Science by the University of Porto, working daily to build high performance, reliable and scalable systems. DevOps Porto meetup co-organizer and DevOpsDays Portugal co-organizer. A strong believer in culture and teamwork. Open source passionate, taekwondo amateur and metal lover.

Current sessions

Treat your infrastructure code with respect

Infrastructure as code (IaC) is the management and provisioning of infrastructure through machine-readable definition files. It’s the process of defining network rules, spinning up servers, configuring servers and services, among other things, by using code instead of manual changes in a repeatable and idempotent way.

In what shape is your infrastructure code documentation? Do you have architecture diagrams for your infrastructure layout? How do you test those changes? How good is your code coverage? Is your IaC automated?

In sum, what’s your Software Development Life Cycle (SDLC) for Infrastructure/Operations code?


Need help? Come to the Backstage

Backstage solves a problem — infrastructure complexity — that’s common to a lot of large and growing companies today, with the ability to simplify tooling and standardize engineering practices.

It’s an open platform for building developer portals, powered by a centralized service catalog, which restores order to microservices and infrastructure. Product teams can ship high-quality code quickly — without compromising autonomy.


Linkerd to the rescue

What is a service mesh and when is it useful?

A service mesh is a dedicated infrastructure layer for making service-to-service communication safe, fast, and reliable. If you’re building a cloud native application, you need a service mesh. It can help connect, secure, control, and observe services. At a high level, a service mesh helps reduce the complexity of these deployments and ease the strain on your development teams.

How does Linkerd provide observability by collecting metrics and telemetry about services? How does Linkerd use mTLS to create secure lines of communication between services? How does Linkerd improve reliability by using retries, timeouts, load baling, and traffic shifting?


GitOps: yea or nay?

GitOps is a paradigm or a set of practices that empowers developers to perform tasks which typically (only) fall under the purview of operations. It’s a way to do Kubernetes cluster management and application delivery by using Git as a single source of truth for declarative infrastructure and applications. Being Git at the center of delivery pipelines, engineers use familiar tools to make pull requests to accelerate and simplify both application deployments and operations tasks to Kubernetes.

GitOps software agents (e.g. ArgoCD, Flux and Jenkins X) can alert on any divergence between Git with what's running in a cluster, and if there's a difference, Kubernetes reconcilers automatically update or rollback the cluster depending on the case.

This talk will include a demo of ArgoCD/Flux/Jenkins X on how to configure and use it to accelerate and simplify application deployments.


Building a scalable logging platform

Logs are a critical part of any system, they give you deep insights about your application, what your system is doing and what caused the error when something wrong happens. Virtually, every system generates logs in some form or another and these logs are written somewhere, probably, to files on local disks. When you’re building large scale applications, your system goes to multiple hosts and managing the logs across multiple hosts can be complicated. Debugging errors in such applications, across hundreds of log files on hundreds of servers, can be very time consuming and complicated. Without the right framework and tools, the debugging process can be a nightmare. How do you go about building a scalable logging platform that can evolve with your needs over time?


What the Service Mesh?!

Cloud platforms provide a wealth of benefits for the organizations that use them. There’s no denying, however, that adopting the cloud can put strains on DevOps teams. Developers must use microservices to architect for portability, meanwhile operators are managing extremely large hybrid and multi-cloud deployments.

A service mesh is a dedicated infrastructure layer for making service-to-service communication safe, fast, and reliable. If you’re building a cloud native application, you need a service mesh. It can help connect, secure, control, and observe services. At a high level, a service mesh helps reduce the complexity of these deployments, and ease the strain on your development teams.

This talk will start by introducing what a service mesh is and what are it’s main concepts. It will then deep dive into a service mesh implementation, Istio, introducing it’s main concepts, architecture as well as demoing it’s use in conceptual “real” scenario. Concepts like Traffic Control (Canary Deployments, Dark Launches, Egress), Service Resilience (Load Balancing, Timeouts, Retries, Circuit Breaking, Pool Ejection), Chaos Testing (HTTP Errors, Delays), Observability (Tracing, Metrics), and Security (Blacklist, Whitelist) will be explored in this talk.


KDD: Kubernetes Driven Development

Kubernetes is becoming the standard in container orchestration but tooling is still lagging behind more mature environments. While that’s true, the community has rallied behind this technology and has started to develop tools to help develop, manage and deploy Kubernetes ready applications. This talk aims to introduce tools like Telepresence to improve development by allowing easy interfacing with Kubernetes deployed services, Skaffold to automate builds and deploys and Helm to manage deploys. This talk will also live demo the development of an application and demonstrate all those tools at work.