We are looking for a Data Engineer in Bangalore. This is an amazing opportunity to work in Data engineering environment including Pyspark, databricks, Python, Apache.
About You – experience, education, skills, and accomplishments
- Bachelor's degree in computer science or equivalent experience
- 0-2 years of experience in working in software development
- Good experience in building massively scalable distributed data processing solutions
- Good experience of database design & development.
- Building Data Pipelines & ETL jobs using cloud-native technologies & design patterns
- Experience in designing resilient systems & creating disaster recovery plans
- Working in Master Data Management & designing CMSes, or evaluating 3rd party CMS products
- Working in Agile Scrum or Kanban teams & deploying solutions using Continuous Delivery best practices
- Using automated database migration tools & have strong opinions on version control best practices for SQL scripts.
Area
Must-have skills
Desirable skills in one or more of:
NoSQL/Big Data
- Apache Spark, ElasticSearch
- Spark, ElasticSearch, Cassandra, Hadoop, Apache Hive, Snowflake, Jupiter notebook, databricks stack
RDBMS
- PostgreSQL or Oracle DB experience
- Oracle 11g+, PostgresSQL 9+, AWS RDS
Languages
- Proficient in one or more of: Python, Scala, Java
- Proficient in SQL, PL/SQL, XML, JSON
- Working knowledge in one or more of: Java 6+, Python 2+, JavaScript
Cloud Technologies and Tools
- Experience in designing cloud-based data pipelines & solutions
- AWS: EMR, AWS Glue, S3, EC2, RDS, Aurora PostgreSQL, Lambda.
Also Kubernetes, schedulers, message queues
It would be great if you also had . . .
- Good analytical, verbal and written interpersonal and communication skills due to the number of internal customers they will be required to support
What would you be doing in this role:
- Provide technical thought leadership, compare different technologies to meet business requirements and cost control drivers.
- Work with Business and IT groups to design and deliver a data lake platform.
- Produce & maintain the overall solution design for the entire Data Lake Platform.
- Execution of data strategy, help in the design and architecture of solutions on the platform
- Enforce technical best practices for Big Data management and solutions, from software selection to technical architectures and implementation processes.
- Document and publish best practices, guidelines, and training information.
- Ensures all functional solutions and components of the Data Lake platform service are designed and implemented in a way to always meet SLAs.
- Contributes to the continuous improvement of the support & delivery functions by maintaining awareness of technology developments and making appropriate recommendations to enhance application services.
- Focus on data quality throughout the ETL & data pipelines, driving improvements to data management processes, data storage, and data security to meet the needs of the business and customers
About the Team
You will be joining a team responsible for creating and maintaining internal tools which allows the company to take unstructured data available on the internet into structured data which can then be cross referenced and analyzed. The data will be exposed to multiple products which are in term provided to our customers. You will be interacting with other teams in creating a service mesh structure which communicate through asynchronous queue services hosted in AWS.
Hours of work
- Full-time
- 45 hours per week
- Hybrid working model
At Clarivate, we are committed to providing equal employment opportunities for all qualified persons with respect to hiring, compensation, promotion, training, and other terms, conditions, and privileges of employment. We comply with applicable laws and regulations governing non-discrimination in all locations.