AWS Data Architect

LTIMindtree4 months ago
Torrance, CA, United States
On-site
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

We are seeking an AWS Data Architect to design and implement a scalable data hub solution on AWS. This Full Time role is based onsite in Torrance, CA and requires candidates with USC and GC Visa eligibility.

Key Responsibilities

  • Data Architecture Design: Architect and implement a scalable data hub solution on AWS using best practices for data ingestion, transformation, storage, and access control. Define data models, data lineage, and data quality standards for the DataHub. Select appropriate AWS services (S3, Glue, Redshift, Athena, Lambda) based on data volume, access patterns, and performance requirements. Conceptualize a design that accommodates AI/ML applications in future phases.
  • Data Ingestion and Integration: Design and build data pipelines to extract, transform, and load data from various sources (databases, APIs, flat files) into the DataHub using AWS Glue, AWS Batch, or custom ETL processes. Implement data cleansing and normalization techniques to ensure data quality. Manage data ingestion schedules and error handling mechanisms.
  • Data Governance and Access Control: Establish data access controls and security policies using IAM roles and policies. Develop frameworks for data quality checks, data lineage tracking, and data retention.
  • Data Analytics Enablement: Create data catalogs and metadata management systems to facilitate data discovery and understanding. Design and implement data views and dashboards using Power BI for data exploration and visualization. Build data warehouses and data marts to meet business needs.
  • Monitoring and Optimization: Monitor data pipeline performance, data quality, and system health; proactively resolve issues. Optimize storage and processing costs by leveraging AWS cost optimization features.
  • Data Exchange: Develop the governance, security, monitoring, and guard rails to enable efficient data exchange between internal applications and external vendors, partners, and SaaS providers. Establish intake processes, SLAs, and usage rules for both internal and external data set producers and consumers.

Required Qualifications

  • AWS Expertise: Deep understanding of AWS data services including S3, Glue, Redshift, Athena, Lake Formation, Sep Functions, CloudWatch and EventBridge.
  • Data Modeling: Proficiency in designing dimensional and snowflake data models for data warehousing and data lakes.
  • Data Engineering Skills: Experience with ETL/ELT processes, data cleansing, data transformation, and data quality checks. Familiarity with Informatica IICS and ICDQ is a plus.
  • Programming Languages: Proficiency in Python, SQL, and potentially PySpark for data processing and manipulation.
  • Data Governance: Knowledge of best practices including data classification, access control, and data lineage tracking.

Preferred Qualifications

  • Experience with data lakehouse architectures and leveraging both structured and unstructured data.
  • Familiarity with data visualization tools like Tableau or Power BI.
  • Strong communication and collaboration skills for effectively working with both business and technical stakeholders.
  • AWS certifications related to data analytics and architecture.

Required Skills

Power BI
AWS Redshift
IAM Roles & Policies
Data Modeling
Data Ingestion & Integration
AWS S3
Data Governance
SQL
AWS Lambda
ETL/ELT Processes
Python
AWS Athena
AWS Glue
Data Architecture Design