Colm Austin

Senior Site Reliability Engineer at Hostelworld Group
  • Claim this Profile
Contact Information
us****@****om
(386) 825-5501
Location
Greater Dublin, IE
Languages
  • English -

Topline Score

Topline score feature will be out soon.

Bio

Generated by
Topline AI

5.0

/5.0
/ Based on 2 ratings
  • (2)
  • (0)
  • (0)
  • (0)
  • (0)

Filter reviews by:

LinkedIn User

Colm started during dificult times, with a lot of change going on, loose ends, 3 different tech stacks and in the middle of several migrations as the first EMEA based incident manager 4,5 years ago. Despite all those adversities and due to Colms outstanding experience and deep technical understanding he crushed it. I'am truly impressed by his analytical skills, working on an issue and driving it to resolution. He stays calm and laser sharp focused under tons of pressure, directing developers and service owners, engaging others to help and communicates on a first class level with executives up to the CEO. During post mortems/incident reviews, Colm helped teams to get better and prevent issues from happening again, by pushing them to their limits. He is also working on improving and updating existing proccesses, as well as forming new ones, to make everyones life easier. Simply put he never disappoints, which made him an invaluable assest to the entire org. Colm is a stellar employee you want on every team. Needless to say, I'd work with him again any time!

Mark D

I worked closely with Colm, when we were growing Daft.ie in it's early days. Colm often acted as my primary DevOps interface, fielding endless technical, commercial and product demands with patience, insight and humour. The multi-disciplinary demands of an early-stage startup growing at light-speed are something many have not experienced. Colm's knowledge, perspective and experience are therefore, invaluable assets i believe, for any team with similarly ambitious goals.

You need to have a working account to view this content.
You need to have a working account to view this content.

Credentials

  • Certified Incident Responder
    PagerDuty
    Oct, 2020
    - Nov, 2024
  • AWS Certified Cloud Practitioner
    Amazon Web Services (AWS)
    Feb, 2020
    - Nov, 2024

Experience

    • Ireland
    • Travel Arrangements
    • 200 - 300 Employee
    • Senior Site Reliability Engineer
      • Oct 2022 - Present

    • United States
    • Technology, Information and Internet
    • 700 & Above Employee
    • Manager, Incident Management EMEA
      • Jul 2021 - Sep 2022

      • Managing a team of Reliability Engineers responsible for monitoring, troubleshooting and driving resolution on all global impacting production incidents using ITIL best practices. • Working with multiple engineering teams to identify and remove system bottlenecks, potential points of failure and gaps in observability as part of post incident and operational readiness reviews. • Setting team objectives and goals. Holding regular 1-1 check-ins and mentoring team members to help… Show more • Managing a team of Reliability Engineers responsible for monitoring, troubleshooting and driving resolution on all global impacting production incidents using ITIL best practices. • Working with multiple engineering teams to identify and remove system bottlenecks, potential points of failure and gaps in observability as part of post incident and operational readiness reviews. • Setting team objectives and goals. Holding regular 1-1 check-ins and mentoring team members to help them achieve their goals. Carrying out end of year performance reviews. • Interviewing, hiring and onboarding engineers. • Working in an Agile environment with a Hybrid Cloud platform - AWS, Kubernetes, Docker, RDS, Nginx, HAProxy, Redis, Memcache, MySQL, PostgreSQL, GIT, Jenkins based CI/CD.

    • Technical Duty Officer (TDO)
      • Jun 2015 - Jun 2021

      • Worked on a team of designated Reliability Engineers for all major site impacting incidents. Driving incidents to their resolution; including handling any documentation, notifications, escalations, and executive communications. • Responsible for daily change management to minimise risk and change collision while not slowing down product changes. Working as a point of contact with multiple team leads scheduling critical infrastructure changes to avoid or minimise user impact. •… Show more • Worked on a team of designated Reliability Engineers for all major site impacting incidents. Driving incidents to their resolution; including handling any documentation, notifications, escalations, and executive communications. • Responsible for daily change management to minimise risk and change collision while not slowing down product changes. Working as a point of contact with multiple team leads scheduling critical infrastructure changes to avoid or minimise user impact. • Reviewing incidents for common or chronic problems/issues that either impact a service or the ability to monitor, manage, or mitigate impact to services. Chronic, stubborn, or recurring problems are identified, documented, triaged, and assigned for resolution in Jira. • Worked to shape best practices in the Reliability Org. Including defining and improving policies, procedures, and processes used by the overall Engineering org. • Created and led a mentoring program to identify, train and onboard individuals with the right mix of technical and soft skills to grow the team into a 24/7 follow the sun model.

    • Technology, Information and Internet
    • 1 - 100 Employee
    • Lead System Administrator
      • Oct 2013 - May 2015

      • Ultimate responsibility for the infrastructure of Daft.ie, Adverts.ie and Boards.ie across multiple data centres and cloud providers. • Leading a team of system administrators of varying experience and skill levels. This involved detailed project planning with the team as a whole, as well as individual goal setting, reviews and performance management sessions. • Worked closely with the Group Financial Controller to plan and implement cost saving measures resulting in a 55%… Show more • Ultimate responsibility for the infrastructure of Daft.ie, Adverts.ie and Boards.ie across multiple data centres and cloud providers. • Leading a team of system administrators of varying experience and skill levels. This involved detailed project planning with the team as a whole, as well as individual goal setting, reviews and performance management sessions. • Worked closely with the Group Financial Controller to plan and implement cost saving measures resulting in a 55% saving. • Led a cross discipline team to develop a quality monitoring system allowing for the monitoring and active targeting of low quality listings on site. • Designed and implemented a new more granular server monitoring system (Graphite, Cabot, Grafana). • SLA monitoring and reporting to senior management team including: Uptime, page render speeds, response times.

    • Senior System Administrator
      • Oct 2011 - Oct 2013

      • Designed and implemented a dynamic, high availability media serving system for all the Distilled Media group sites, supporting the varying requirements of each site while running on a cost effective platform. (AWS, Autoscaling, EC2, S3, Varnish, Webp, Spdy). • Implemented a group wide firewall and load balancing layer allowing the ability to cache and serve static content (FreeBSD, Varnish, PF, Nginx). • Involved in the infrastructure design for a number of the groups websites… Show more • Designed and implemented a dynamic, high availability media serving system for all the Distilled Media group sites, supporting the varying requirements of each site while running on a cost effective platform. (AWS, Autoscaling, EC2, S3, Varnish, Webp, Spdy). • Implemented a group wide firewall and load balancing layer allowing the ability to cache and serve static content (FreeBSD, Varnish, PF, Nginx). • Involved in the infrastructure design for a number of the groups websites (Thejournal.ie, Boards.ie, Adverts.ie) facing a number of new unique challenges due to the differing traffic patterns of each individual website. • Involved in yearly roadmaps and project outlining reporting to the group CTO. • Responsible for mentoring and upskilling junior workmates.

    • System Administrator
      • Jun 2011 - Oct 2011

      • Leading PCI compliance projects which involved implementing best practice security features. • Migrated to a Jenkins/Git Continuous Integration, automated testing, and deployment system. • Implemented Puppet based server configuration management system. • Set up system monitoring and alerting to find and resolve system bottlenecks (Nagios, Munin). • Project managed multiple office infrastructure moves to accommodate a fast growing team (PBX, Samba domain… Show more • Leading PCI compliance projects which involved implementing best practice security features. • Migrated to a Jenkins/Git Continuous Integration, automated testing, and deployment system. • Implemented Puppet based server configuration management system. • Set up system monitoring and alerting to find and resolve system bottlenecks (Nagios, Munin). • Project managed multiple office infrastructure moves to accommodate a fast growing team (PBX, Samba domain controller, incremental backups).

    • Ireland
    • Online Audio and Video Media
    • 1 - 100 Employee
    • System Administrator
      • Jun 2006 - Jun 2011

Education

  • Maynooth University
    Coputer Science and Software Engineering
    2001 - 2006

Community

You need to have a working account to view this content. Click here to join now