This is a Virtualization Server Hosting Engineering position in Enterprise Technology.
Virtualization Hosting service enablement
Capacity Management
- Conduct capacity planning and forecasting for the platforms, including Compute/Virtual Machine (VM), memory, storage, and network resources, to ensure scalability and prevent resource exhaustion
- Analyze resource utilization trends and make recommendations for infrastructure scaling, consolidation, or optimization
- Collaborate with application teams and stakeholders to understand future demand and project capacity needs
- Develop and maintain capacity models and reports to support strategic planning
Automation & Efficiency
-
Develop automation solutions (scripts, playbooks) for repetitive VMware/OSV tasks, including configuration changes, VM management (like snapshot removal), auditing, remediation and integration with ticketing systems
-
Leverage automation to enable delivering operator updates and changes efficiently at scale
-
Implement Site Reliability Engineering (SRE) principles and practices to improve overall platform stability, performance, and operational efficiency
-
Role Based Access Control deployment and auditing
-
Namespace and Resource Quota management (CPU, Disk and Storage)
Observability, Monitoring, logging and Troubleshooting
-
Implement and maintain comprehensive end to end observability solutions (monitoring, logging, tracing) for the VMware/OSV environment, including integration with tools like Dynatrace and Prometheus/Grafana
-
Explore and implement Event Driven Architecture (EDA) for enhanced real time monitoring and response
-
Develop capabilities to flag and report abnormalities and identify "blind spots" in observability
-
Perform deep dive Root Cause Analysis (RCA), potentially utilizing available tooling, to quickly identify and resolve issues across the global compute environment
-
Find the needle in a haystack/unhealthy bit in the compute universe (Globally) for faster time to resolution
-
-
Monitor VM health, resource usage, and performance metrics proactively
-
Monitor for unusual activity that might indicate a compromise or misconfiguration
Solution Design & Consulting
-
Provide technical consulting and expertise to application teams requiring VMware/OSV solutions
-
Design, implement, and validate custom or dedicated OSV clusters and VM solutions for critical applications with unique or complex requirements (e.g., specialized appliances)
Knowledge Management
-
Create, maintain, and update comprehensive internal documentation and customer facing content to facilitate self-service and clearly articulate platform capabilities
Support
-
Participate in L1 – L3 level support to Operations teams environmental related issues. Monthly after hours and weekend work will be required
- Responsible for engineering, deployment, operational administration, and protection of Global enterprise solutions to meet the Virtualization Server Hosting requirements for a variety of infrastructure systems, line of business and third-party application needs
- Architect/design and support the installation, and administration of virtualization (VMware and OpenShift Virtualization (OSV)) suite of products
- Responsible for the entire lifecycle of technologies globally including infrastructure security vulnerability patching, planning, designing, implementation, maintenance, upgrades and decommissioning of hardware and software
- Engineer, test and document procedures, monitoring, logging, disaster recovery process and security policies and guidelines
- Demonstrates knowledge of hardware and software products
- Research industry best practices and trends
- Global large-scale deployments of virtualization technologies
- Support HPE Synergy/ProLiant ILO and firmware field testing
Required Qualifications:
- Bachelor's degree in computer science, Information Technology, or a related field, or equivalent practical experience
- Total 10+ years IT experience including 4+ years of experience in IT infrastructure, with at least 3+ years demonstrable experience focused on VMware, Kubernetes and/or OpenShift
- Understanding of VMware and Kubernetes concepts
- Experience with Linux administration and networking fundamentals
- Proficiency in scripting languages for automation
- Experience with monitoring tools and logging solutions
- Understanding of virtualization concepts and technologies (e.g., KVM, VMware)
- Excellent problem-solving skills and the ability to troubleshoot complex issues across multiple layers of the stack
- Knowledge of CI/CD pipelines and DevOps methodologies
- Strong communication and collaboration skills
- Self-starter. Be on a mission to go where the work is. Look for opportunities to evolve services

