
Data Analytics Engineer (AI/ML Focus)
Tata Consultancy Services, Sunnyvale, CA, United States
Responsibilities
Data Pipeline & Infrastructure Development: Build, maintain, and scale data pipelines (ETL or ELT) using tools like Apache Spark, Airflow, and Kafka to support AI and ML workloads.
AI Ready Data Preparation: Transform messy, unstructured data (text, images, video) into structured datasets suitable for model training, including handling feature engineering and vector database ingestion.
ML Model Product ionization: Partner with data scientists to deploy ML models, create APIs for models, and implement MLOps practices, including monitoring for data drift.
Analytics and Visualization: Create dashboards (Tableau, Power BI, Looker) and run SQL queries to provide actionable business insights, acting as an analytics engineer.
Data Governance & Quality: Ensure data quality, reliability, and security (PII or PHI) within AI systems, ensuring compliance with regulations like GDPR or HIPAA.
Cloud and Data Management: Operate within cloud environments (AWS, Azure, Google Cloud) using services like S3, Redshift, Glue, or Databricks.
Key Skills and Qualifications
Programming Languages: Expert level Python and Advanced SQL are mandatory. Java or Scala are preferred for large scale distributed systems.
ML Frameworks: Familiarity with libraries such as PyTorch, TensorFlow, or scikit-learn for data manipulation and model interaction.
Data Engineering Tools: Experience with Apache Spark, Kafka, Airflow, dbt, and vector databases (Pinecone, Milvus).
Cloud Platforms: Hands on experience with AWS (Glue, SageMaker) or GCP.
Analytical Skills: Strong ability to perform exploratory data analysis (EDA) and interpret complex datasets.
Soft Skills: Must have strong communication to bridge technical data engineering with business stakeholders.
Salary Range: $70,000 - $125,000 a Year
#J-18808-Ljbffr
Data Pipeline & Infrastructure Development: Build, maintain, and scale data pipelines (ETL or ELT) using tools like Apache Spark, Airflow, and Kafka to support AI and ML workloads.
AI Ready Data Preparation: Transform messy, unstructured data (text, images, video) into structured datasets suitable for model training, including handling feature engineering and vector database ingestion.
ML Model Product ionization: Partner with data scientists to deploy ML models, create APIs for models, and implement MLOps practices, including monitoring for data drift.
Analytics and Visualization: Create dashboards (Tableau, Power BI, Looker) and run SQL queries to provide actionable business insights, acting as an analytics engineer.
Data Governance & Quality: Ensure data quality, reliability, and security (PII or PHI) within AI systems, ensuring compliance with regulations like GDPR or HIPAA.
Cloud and Data Management: Operate within cloud environments (AWS, Azure, Google Cloud) using services like S3, Redshift, Glue, or Databricks.
Key Skills and Qualifications
Programming Languages: Expert level Python and Advanced SQL are mandatory. Java or Scala are preferred for large scale distributed systems.
ML Frameworks: Familiarity with libraries such as PyTorch, TensorFlow, or scikit-learn for data manipulation and model interaction.
Data Engineering Tools: Experience with Apache Spark, Kafka, Airflow, dbt, and vector databases (Pinecone, Milvus).
Cloud Platforms: Hands on experience with AWS (Glue, SageMaker) or GCP.
Analytical Skills: Strong ability to perform exploratory data analysis (EDA) and interpret complex datasets.
Soft Skills: Must have strong communication to bridge technical data engineering with business stakeholders.
Salary Range: $70,000 - $125,000 a Year
#J-18808-Ljbffr