Join Verinext, a technology company that's not just keeping up with the future, but actively shaping it. At Verinext, we firmly believe that work should be as enjoyable as it is rewarding. As an IT Operations Engineer, you'll be stepping into an environment that thrives on innovation and fun. Our team-oriented culture isn't just a buzzword; it's a cornerstone of our success. We're incredibly proud to have been recognized as a "Best Place to Work" by the Philadelphia Business Journal for 10 consecutive years.
We are seeking a talented and experienced IT Operations Engineer to join our dynamic team. The ideal candidate will take the lead in monitoring configuration across multiple systems, integrating various tools and APIs, and developing automation scripts to enhance our IT operations. This role requires a strong technical background in infrastructure monitoring, system engineering, and scripting.
Key Responsibilities:
Administration of Tools:
- Manage and maintain monitoring tools to ensure optimal performance and reliability.
- Act as the primary administrator for LogicMonitor, configuring alerts, thresholds, and escalation paths.
Tool and API Integration:
- Implement and manage integrations between various IT tools and APIs, including server, virtualization, and network platforms.
- Develop and maintain API integrations for Zerto, Commvault, VMware, and other key systems.
Data Aggregation:
- Aggregate and analyze data from various sources to support infrastructure monitoring and incident response.
Monitoring Configuration and Enhancement:
- Act as the technical lead for configuring monitoring across servers (Windows/Linux), virtualization (VMware/Hyper-V), networking (firewalls, switches, routers), storage (SAN/NAS), and backup/DR platforms (Commvault, Zerto).
- Optimize LogicMonitor alert configurations to reduce false positives and improve signal clarity.
- Refine BigPanda correlation logic for effective root cause identification and smarter incident response.
Automation and Scripting:
- Develop and maintain automation scripts and integrations using Python, PowerShell, and Bash to support event enrichment, ticket enrichment, and workflow automation.
- Support roadmap efforts for self-healing automation, telemetry standards, and event-driven workflows.
Collaboration and Reporting:
- Interface with Systems, Networking, and Storage teams to ensure comprehensive monitoring across all critical assets.
- Build and manage dashboards, uptime checks, synthetic monitors, and availability reports for real-time operational awareness.
- Contribute to incident review cycles with feedback loops to improve monitoring scope and reduce operational overhead.
Documentation and Support:
- Create and maintain clear, structured runbooks and playbooks for alert triage, escalation, and routine issue remediation.
Requirements
Qualifications Required:
Preferred Qualifications: