Machine Learning Engineer
Clara
Clara is the leading spend management platform for companies in Latin America. Our end-to-end solution includes our locally-issued corporate cards, a Bill Pay product, financing solutions, and our highly-rated software platform; already being used by thousands of the most successful companies across the region.
Clara is backed by top global and regional investors such as GGV Capital, Coatue, DST Global Partners, General Catalyst, monashees, Acrew Capital, Kaszek, Citius, Canary, Citi Ventures, Picus Capital, Avid Ventures, ICONIQ Growth, Endeavor Catalyst, Goldman Sachs, Accial Capital, and prominent angel investors.
Your primary focus will be to develop data pipelines for ML, design, implement and deploy the ML models and infrastructure, improve existing processes, and create data models. You will also be responsible for the maintenance and operation of all these models and data pipelines, and collaborate with other teams in order to provide high-quality and fast-delivery ML models for the company.The main challenges and responsibilities you will face are:
- Build, integrate and maintain ML feature data pipelines following reliability, scalability, and maintainability principles.
- Deploy, monitor, and operate ML model lifecycle (ML Ops).
- Collaborate with data scientists in the implementation of ML models.
- Ensuring and monitoring the data quality, and understanding/resolving any issue detected.
- Consume data from Data Vaults and Star Schema lakehouse.
- Be able to collaborate with teams remotely, not only locally.
- Stay up to date with the latest technologies and look for ways to implement them.
- Collaborate with our data scientists and data engineers to better understand their requirements and meet the company’s goals.
- Document your work to have a solid foundation of it for new team members or future references.
- Code review and peer programming activities.
- Enforce team best practices and DevOps as a culture within the team.
- Enforce Data Security and Data Privacy best practices.
- Proficiency in Python and SQL.
- ML Algorithms experience, both theoretical and practical.
- Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins, etc.).
- Experience using Github flow.
- Docker container experience.
- Experience implementing the feature store paradigm.
- Experience deploying streaming ML models with common ML frameworks (e.g. xgboost, scikit-learn, catboost).
- High-level understanding of Distributed Systems, and Spark Architecture.
- Experience integrating data from Databases, APIs, and Event Streams (kinesis, kafka).
- Experience developing complex data pipelines (ETL/ELT) with orchestration tools (e.g. Apache Airflow, AWS Glue Workflow, AWS Step Functions, etc.)
- Experience with AWS services: Redshift, Glue, Sagemaker, Lambda, Athena, S3, Kinesis. Or other cloud equivalent services.
- Experience with big data technologies and frameworks (e.g. (py)Spark, Scala, Hive, Kafka, etc.)
- Databrick platform usage and administration
- MLflow