Hadoop administrator

3B Staffing LLC, San Jose, CA, United States

Tasks and Responsibilities
• Hadoop administrator provides support and maintenance and its eco-systems including HDFS, Yarn, Hive, LLAP, Druid, Impala, Spark, Kafka, HBase, Cloudera Work Bench, etc.
• Accountable for storage, performance tuning and volume management of Hadoop clusters and MapReduce routines
• Deploys Hadoop cluster, add and remove nodes, keep track of jobs, monitor critical parts of the cluster, configure name-node high availability, schedule and configure it and take backups.
• Installs and configures software, installs patches, and upgrades software as needed.
• Capacity planning and implementation of new/upgraded hardware and software releases for storage infrastructure.
• Involves designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration
• Communicates with other development, administrating and business teams. They include infrastructure, application, network, database, and business intelligence teams.
• Responsible for Data Lake and Data Warehousing design and development.
• Collaboration with various technical
on-technical resources such as infrastructure and application teams regarding project work, POCs (Proofs of Concept) and/or troubleshooting exercises.
• Configuring Hadoop security, specifically Kerberos integration with ability to implement.
• Creation and maintenance of job and task scheduling and administration of jobs.
• Responsible for data movement in and out of Hadoop clusters and data ingestion using Sqoop and/or Flume
• Review Hadoop environments and determine compliance with industry best practices and regulatory requirements.
• Data modeling, designing and implementation of data based on recognized standards.
• Working as a key person for Vendor escalation
• On-call rotation is required to support 24/7 environment and is also expected to be able to work outside business hours to support corporate needs.

Minimum Qualifications:
• Bachelor's degree in Information Systems, Engineering, Computer Science, or related field from an accredited university.
• Intermediate experience in a Hadoop production environment.
• Must have intermediate experience and expert knowledge with at least 4 of the following:

o Hands on experience with Hadoop administration in Linux and virtual environments.

o Well versed in installing & managing distributions of Hadoop (Cloudera).

o Expert knowledge and hands-on experience in Hadoop ecosystem components; including HDFS, Yarn, Hive, LLAP, Druid, Impala, Spark, Kafka, HBase, Cloudera Work Bench, etc.

o Thorough knowledge of Hadoop overall architecture.

o Experience using and troubleshooting Open Source technologies including configuration management and deployment.

o Data Lake and Data Warehousing design and development.

o Experience reviewing existing DB and Hadoop infrastructure and determine areas of improvement.

o Implementing software lifecycle methodology to ensure supported release and roadmap adherence.

o Configuring high availability of name-nodes.

o Scheduling and taking backups for Hadoop ecosystem.

o Data movement in and out of Hadoop clusters.

o Good hands-on scripting experience in a Linux environment.

o Experience in project management concepts, tools (MS Project) and techniques.

o A record of working effectively with application and infrastructure teams.
• Strong ability to organize information, manage tasks and use available tools to effectively contribute to a team and the organization.