Data Engineer at Motion Recruitment Arlington, TX

Motion Recruitment3 months ago

Arlington, TX, United States

Hybrid

Contract

Junior Level (1-3 years)

Job Description

Position Overview

Data Engineer job at Motion Recruitment in Arlington, TX. We have an immediate 6 Month Contract-to-Hire opportunity for a Data Engineer. This role requires working onsite in Arlington, TX two days per week (Tuesdays, Wednesdays preferred) and involves developing scalable data systems for processing large, semi-structured or unstructured data sets. The position supports both off-line and in-line machine learning training as well as search engine based analytics through batch and streaming data transformation processes.

Key Responsibilities

Troubleshoot complex problems and work across teams to meet commitments.
Contribute to the evaluation, research, and experimentation of batch and streaming data engineering technologies.
Collaborate with data engineering groups to showcase and adopt emerging technologies.
Define and refine processes and procedures for the data engineering practice.
Work with data scientists, data architects, ETL developers, and business partners to capture and format data from diverse sources.
Code, test, deploy, monitor, document, and troubleshoot data processing systems and associated automation.
Conform with all company policies and procedures.

Required Qualifications

Bachelor’s Degree in a related field or equivalent work experience.
4-6+ years of experience in data engineering.
3+ years of Python experience, including manipulation of Data Frames and transformation logic.
Strong SQL knowledge.
Experience with unstructured/semi-structured data (e.g., JSON, XML).
Experience with Databricks.
Knowledge of Ralph Kimball Star Schema.
3-5 years of hands-on experience processing large data sets.
3-5 years of hands-on experience with SQL, data modeling, and working with relational and/or NoSQL databases.
Strong interpersonal, verbal, and writing skills.

Preferred Qualifications

Experience with processing large data sets using Hadoop, HDFS, Spark, Kafka, Flume, or similar distributed systems.
Experience with ingesting various data formats such as JSON, Parquet, SequenceFile, and working with cloud databases.
Experience with Cloud technologies (Azure, AWS, GCP) and native toolsets like Azure ARM Templates, Hashicorp Terraform, or AWS CloudFormation.
Understanding of cloud computing technologies, business drivers, and emerging trends.
Familiarity with hybrid cloud computing models, virtualization technologies, and various cloud delivery models (IaaS, PaaS, SaaS).
Working knowledge of object storage technologies such as Data Lake Storage Gen2, S3, Minio, Ceph, or ADLS.
Experience with containerization including Docker, Kubernetes, Spark on Kubernetes, or Spark Operator.
Familiarity with Agile development frameworks (SAFe, Scrum) and Application Lifecycle Management.
Experience with source control management systems, build systems, code quality tools, artifact repositories, and CI/CD pipelines.
Experience with NoSQL data stores such as CosmosDB, MongoDB, Cassandra, Redis or related technologies integrating search capabilities.
Experience in creating and maintaining ETL processes.
Knowledge of IT governance and privacy compliance best practices.
Experience with Adobe solutions (e.g., Adobe Experience Platform, DTM/Launch) and REST APIs.
Proficiency in digital data collection and familiarity with digital technology solutions (DMPs, CDPs, Tag Management Platforms, etc.).
Understanding of real-time CDP and journey analytics solutions.
Knowledge of big data platforms, data stream processing pipelines, data lake architectures, and data lake houses.
Strong SQL querying skills with the ability to derive actionable insights.
Understanding of cloud solutions and architectures on platforms like Google Cloud Platform, Microsoft Azure, and Amazon AWS.
Familiarity with GDPR, privacy, and security topics.

Required Skills

Data Modeling

SQL

Stream Processing

Databricks

Batch Processing

ETL Processes

Data Engineering

Semi-Structured Data Handling

Python