hero

Work at a Portfolio Company

Senior Data Engineer

You.com

You.com

Data Science
United States
Posted on Saturday, June 15, 2024

About You.com:

With a deep foundation in search, You.com is truthful, accurate, and transparent, and addresses hallucinations.

You.com was founded by leading AI research scientists, Richard Socher and Bryan McCann. Richard was previously the Chief Scientist at Salesforce and is the third most-cited researcher in natural language processing (NLP) with over 170,000 citations. Bryan was a lead research scientist at Salesforce Research specializing in deep learning and NLP. Over the years, Richard and Bryan’s collaborative research has had significant implications for the field of NLP, particularly in the areas of word vectors, contextual vectors, and prompt engineering. Richard's contributions were recently recognized with his inclusion in Time Magazine’s TIME100 AI list in 2023 as one of the “most influential people in AI” and the prestigious 2023 ACL Test-of-Time Paper Award for his influential research published in 2013.

Since its founding, You.com has transformed how people discover and engage with information online as an AI Assistant that helps people accomplish and solve everyday needs. Recognized as one of Fortune Magazine’s 50 AI Innovators for 2023 and featured in Time Magazine’s “Best Inventions of 2022,” You.com has pioneered many solutions for Large Language Model (LLM) challenges, especially around trust and accuracy. You.com notably introduced the first consumer-facing LLM with access to the internet to provide up-to-date answers and include citations. You.com's API further enables other LLM-based chatbots to improve their accuracy with real-time web access. You.com also emphasizes personalized AI chat experiences, offering tailored responses based on users’ backgrounds, interests, and preferences while respecting privacy and ensuring transparent control over personal data.

You.com is accessible on desktop, Chrome web extensions, iOS and Android apps, and WhatsApp.

About the Role:

As Senior Data Engineer - Analytics, you will work cross-functionally establishing data engineering and data science excellence to help You.com grow its product. You will optimize data warehouse design and performance, evolve critical product analytics systems, enable and expand product data use cases, and help develop a world-class data culture. You are an ideal candidate if you take pride in your dual expertise as both a data engineer and a data scientist, working on projects end-to-end to understand user behavior and growth.

Responsibilities:

  • Data Pipeline Development: Design, build, and maintain robust and efficient data pipelines and APIs. These will collect, process, and serve data from diverse sources such as backend events, customer interactions, marketing channels, and LLM evaluations, contributing directly to our next generation of data-driven product growth.

  • Cross-functional collaboration: Work collaboratively with partners across functions, including product managers, marketing teams, and data scientists. Identify and prioritize opportunities for significant business impact, understand requirements for data infrastructure, drive engineering decisions, and quantify impact.

  • Scale and Optimize: Design and implement scalable and extensible data architectures and ETL processes to manage and understand our rapidly growing user base. Optimize data pipelines for enhanced performance, scalability, and reliability.

  • Operational Excellence: Efficiently manage resources in cloud environments (AWS/Azure) using tools like Terraform and Kubernetes. Own the end-to-end events instrumentation, ensuring data completeness and correctness.

Qualifications:

  • Educational & Professional Experience: Bachelor’s degree in Computer Science or related field, or at least 4 years of experience in a Data Engineering role.

  • Technical Expertise:

    • Proficient in distributed processing frameworks (Databricks/Spark), stream processing, and event-driven technologies (e.g., Kafka), including their integration via REST and Python APIs.

    • Advanced skills in Python and Spark technologies (Spark SQL, DataFrames, Spark Streaming, RDD caching, Spark MLib) with a proven ability to develop, debug, and optimize Spark code.

    • Experienced in using Terraform for infrastructure automation and familiar with cloud platforms like Azure and AWS.

    • Skilled in modeling and scaling event telemetry systems for analytical purposes.

    • Demonstrated ability to analyze large data sets to identify gaps and inconsistencies, provide data insights, and advance effective product solutions

  • Communication and Mindset:

    • A proactive, "get things done" attitude, with a track record of adapting to new domains and using data to drive product improvements.

    • Excellent problem-solving and analytical abilities, with the capacity to simplify complex technical topics for diverse audiences.

Our Perks:

  • A remote-first work environment with hubs located in California, NYC, and Canada that offer in-person gatherings monthly.

  • Unlimited PTO with 11 U.S. holidays observed and a week shutdown in December to rest and recharge.

  • Competitive health insurance plan, where 100% of the policyholder is covered.

  • 12 weeks of paid paternity leave in the US, additional time off is also considered

  • 401k program

  • $500 work-from-home stipend to be used up to a year of your start date

  • In-person coworking weeks 1 to 2 times a year

  • Chance to collaborate with a team at the forefront of AI research.