Software Architect (Docker/Slurm)

Role description

Software Architect (Docker/Slurm(Simple Linux Utility for Resources Management), Service where a specialized technical profile is required to carry out the installation, configuration, and maintenance of a computing infrastructure with 4 NVIDIA L40s GPUs. This infrastructure must be based on Docker for container management, Slurm as a job and resource queue manager, enabling vGPU technology on GPUs, and the use of monitoring tools such as Prometheus, cAdvisor, DCGM Exporter and Grafana. Additionally, the possible integration and management of MIG(Multi-Instance GPU) should be taken into account if required.

Primary Duties & Responsibilities

  • Specialized technical profile is required to carry out the installation, configuration, and maintenance of a computing infrastructure with 4 NVIDIA L40s GPUs.
  • This infrastructure must be based on Docker for container management, Slurm as a job and resource queue manager, enabling vGPU technology on GPUs.
  • Expertise with the use of monitoring tools such as: Prometheus, cAdvisor, DCGM Exporter and Grafana.
  • Additionally, the possible integration and management of MIG(Multi-Instance GPU) should be taken into account if required.


Education & Requirements

  • Minimum 5 years of experience in installing and configuring Docker and Slurm based infrastructures.
  • Demonstrable experience in vGPUs configuration and, if required, MIG (Multi-Instance GPU) configuration.
  • Advanced knowledge in Docker container management and GPU integration using NVIDIA Container Toolkit.
  • Ability to configure MIGs and vGPUs on NVIDIA GPUs.
  • Experience configuring and customizing Grafana for visualization of resource usage metrics.


Preferred

  • Experience with setting up infrastructures for AI model training and testing 

Empleos recomendados

The Cervantes Group

Data Quality Developer

17 de Enero del 2025
En Sitio: Dallas, Estados Unidos
Full-time

The Data Quality Developer will assist the Data Governance & Data Quality teams in the implementation and production of data quality rules. This person analyzes, and evaluates information technology systems operations to determine user needs and requirements and recommends ways to improve systems. Developer who comes from a development background and has experience meeting with users, business units and data modeling teams to craft and assemble technical designs for solutions to be implemented. This person will be interacting with both technical and non-technical audiences within an Informatica PowerCenter IDQ development environments while providing Production support and defining dataflows.  

The Cervantes Group

Data Remediation Analyst

17 de Enero del 2025
Remoto
Full-time

The Data Remediation Analyst will be responsible for driving the data remediation efforts and identifying and correcting errors, inconsistencies, and inaccuracies in data. The ideal person has a strong background in data management, who is passionate about making a positive impact in the financial industry. Monitor and track progress of data remediation efforts, and provide regular updates to management while conducting risk assessments and implement risk mitigation strategies to ensure data integrity. Translate business problems into requirements, process changes, test cases, data mapping, etc., and serve as liaison between numerous cross-functional teams both technical and business units. 

C3 S.A. Inc

Business Intelligence Analyst

17 de Enero del 2025
San Juan
Full-time

Analyzes needs, designs, writes and tests new solutions, to fulfill business needs. This role will be an integral component of the organization's data governance, delivering valuable information through Intelligence and Analytics, providing direct support to the entire company. Applies judgment in devising program logic by selecting and adapting standard programming procedures, ultimately providing insights to help us make better decisions. 

The Cervantes Group

Hardening Compliance Analyst

17 de Enero del 2025
Remoto
Full-time

The Hardening Compliance Analyst will work with the team to help approve the process to measure hardening compliance across various US-based entities. This person will support the team to understand hardening compliance gaps by setting up the initial configurations of the compliance measurement tool, creating new compliance measurement profiles and setting up the reports templates needed. The ideal person is very comfortable assisting the Security teams in consolidating the way/tool used to measure hardening compliance.