求人の詳細
セキュリティ対策に革命を。
サイバーセキュリティの未来を創造する。
Principal MLOps Engineer (Cortex)
Our Mission
At Palo Alto Networks®, we’re united by a shared mission—to protect our digital way of life. We thrive at the intersection of innovation and impact, solving real-world problems with cutting-edge technology and bold thinking. Here, everyone has a voice, and every idea counts. If you’re ready to do the most meaningful work of your career alongside people who are just as passionate as you are, you’re in the right place.
Who We Are
In order to be the cybersecurity partner of choice, we must trailblaze the path and shape the future of our industry. This is something our employees work at each day and is defined by our values: Disruption, Collaboration, Execution, Integrity, and Inclusion. We weave AI into the fabric of everything we do and use it to augment the impact every individual can have. If you are passionate about solving real-world problems and ideating beside the best and the brightest, we invite you to join us!
We believe collaboration thrives in person. That’s why most of our teams work from the office full time, with flexibility when it’s needed. This model supports real-time problem-solving, stronger relationships, and the kind of precision that drives great outcomes.Job Summary
We are looking for a Principal MLOps Engineer with a deep focus on ML Platforms and Infrastructure to join our Data & AI group at Cortex Research. Our team is responsible for designing, building, and scaling the foundational MLOps and LLMOps platforms that power both our Data Scientists and Security Researchers. You will architect the high-performance core infrastructure that enables these roles to build, train, and deploy advanced AI systems—ranging from optimized Small Language Models (SLMs) to complex agentic workflows and RAG systems. If you are passionate about building scalable compute platforms and automating the full ML lifecycle to solve complex data and security challenges, we want to hear from you.
Key Responsibilities
Scale Distributed Training: Design and optimize infrastructure for training and fine-tuning LLMs and SLMs, leveraging distributed GPU workloads, efficient clustering, and compute optimization.
Automate the ML Lifecycle: Architect robust, automated pipelines for continuous training (CT) and deployment (CD) of models, ensuring a seamless flow from raw data collection to production environments.
Build Model Infrastructure: Own the serving architecture for LLMs/SLMs, balancing latency, throughput, and GPU utilization under production traffic.
Implement Advanced Monitoring: Establish comprehensive observability systems to monitor live model performance, data drift, and computational metrics, feeding insights back into the automated training loops for continuous improvement.
Collaborative Architecture: Partner closely with data scientists and security researchers to productize complex model architectures and streamline their workflows, while collaborating with our DevOps team to integrate with core cloud infrastructure.
Qualifications
Required Qualifications
Core Engineering: 4+ years experience as a Senior ML Engineer, MLOps Engineer, or Backend Platform Engineer (Hands-On) working with cloud environments.
Model Lifecycle Engineering: Hands-on experience managing the technical lifecycle of diverse model architectures, spanning classic ML, LLMs/SLMs, and agentic/RAG systems. This includes engineering scalable data preparation and processing pipelines as well as implementing infrastructure for model training, fine-tuning, optimization, and high-throughput production serving.
Distributed Training & Compute: Strong foundational knowledge of Deep Learning concepts (neural network architectures, training dynamics, optimization techniques) paired with proven experience setting up and optimizing distributed training workloads across multiple GPUs (using PyTorch, DeepSpeed, Megatron-LM, or cloud-native training infrastructure).
Cloud & Infrastructure Architecture: Strong infrastructure knowledge within a major cloud provider ecosystem (GCP, AWS, or Azure), specifically leveraging managed AI platforms and services.
Python Expertise: Expert-level Python skills focused on ML infrastructure, pipelines, and automation frameworks.
CI/CD Integration: Experience with modern CI/CD patterns (such as GitLab CI or GitHub Actions) for automating software and model delivery loops.
AI Tooling & Development: Proficient in leveraging day-to-day AI tools and ecosystems (e.g., Claude, Gemini, MCPs, custom skills, and markdown formatting) to generate, review, and test code dynamically within your development cycle.
Preferred Qualifications
Strong GCP ecosystem experience.
Background in data science or deep learning workflows.
Cybersecurity domain knowledge.
Our Commitment
We’re trailblazers that dream big, take risks, and challenge cybersecurity’s status quo. It’s simple: we can’t accomplish our mission without diverse teams innovating, together.
We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at accommodations@paloaltonetworks.com.
Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.
All your information will be kept confidential according to EEO guidelines.
Is role eligible for Immigration Sponsorship? No. Please note that we will not sponsor applicants for work visas for this position.