This is an opportunity to build and drive next-generation infrastructure impact across an organization of 100+ engineers and growth within Web3.
Our developer productivity tech stack runs on the cutting edge to solve unique scaling problems and accelerate development velocity under large and growing complexity.
Our tech stack includes technologies such as Bazel, Kubernetes, and on-prem hardware.
As our software engineering teams and code base continue to grow, so does the need to match and increase the effectiveness of our development environment and CI/CD pipelines.
As a member of the Internal Developer Experience (IDX) team, you will join our effort to provide our colleagues with world-class continuous integration, development environment, and productivity tools to provide an unstoppable foundation to build the Internet Computer.
Our team members come from extensive backgrounds in Site Reliability Engineering (SRE), Developer Operations (DevOps), and Software Engineering.
- Propose and drive projects across teams to enable them to build and test their code effectively with our infrastructure and stay ahead of their needs as the organization scales
- Define SLOs and drive observability for critical services
- Extend and support our CI/CD system and underlying infrastructure that processes tens of thousands of jobs per week
- Define and collect metrics that provide insights into where we lose time and where we can further optimize our pipelines
- Reduce complexity and automate tedious tasks to simplify and speed up the user experience
- Evaluate and integrate new technologies that improve efficiency to scale our infrastructure as the team and our codebase grows
- Define and collect metrics that enable us to understand where development time is lost and to find ways to increase velocity
- Seek a balance between streamlining tool usage and empowering engineers to use the tool of their choice
- Offload compute-heavy tasks to our data centers
- Provide support for the systems and tools maintained by the team.
Create learnings from incidents and support requests to improve our system and services.
Improve documentation and discoverability to empower engineers to solve future problems independently.
- Strong problem solving, communication and software engineering skills
- Experience understanding developer needs, and proposing designs for solutions that optimize their experience
- Experience deploying and operating critical high availability systems, including monitoring, alerting, SLOs, and the runbooks and tooling required to keep them healthy
- Preferably you have built reliable systems with tools that we use on a daily basis:
- CI/CD: GitLab, GitHub
- Metrics, monitoring and visualization: Honeycomb, Elastic Stack, Prometheus and Grafana
- OS, containerization and orchestration: Linux, Docker, Kubernetes
- Programming languages: Golang, Python, Bash and an interest in Rust
- A Bachelor’s degree in Computer Science, or a closely related field.
The unique challenges at DFINITY have attracted many engineers with advanced degrees, however your practical experience is more important to us than your educational background
What kind of engineers are we looking for?
- You should demonstrate a passion for building quality software and systems; understand tradeoffs between simplicity and complexity, and balance the needs of today vs the requirements of tomorrow
- You should know how to engage with engineers to understand their current and future problems, understand the tradeoffs and propose and drive pragmatic solutions.
- You are a team player who enjoys working alongside other brilliant people, collaborating and knows how to reach a consensus to drive a project forward
- You seek to understand the impact of your solution on the customer, in our case our engineering teams, and want to optimize their experience continuously