Jobs
We have below Job opportunities :
We are team of experts for providing solution in Data Engineering & Modernization, Market Research using latest tools & technologies at best prices supporting various business needs.
Data Engineer
LOCATION : Pune, Maharashtra, India.
EXPERIENCE : 2-4 Years
ROLE DESCRIPTION :
As a Data Engineer, you will be hands-on in architecting, building, and optimizing robust, efficient, and secure data pipelines and platforms that power business-critical analytics and applications. You will play a central role in the implementation and
automation of scalable batch and streaming data workflows using modern big data and cloud technologies. Working within cross-functional teams, you will deliver well-engineered, high-quality code and data models, and drive best practices for data reliability,
lineage, quality, and security.
KEY RESPONSIBILITIES :
– Design, build, and optimize scalable data pipelines and ETL/ELT workflows using Spark (Scala/Python), SQL, and orchestration tools (e.g., Apache Airflow, Prefect, Luigi).
– Implement efficient solutions for high-volume, batch, real-time streaming, and event-driven data processing, leveraging best-in-class patterns and frameworks.
– Build and maintain data warehouse and lakehouse architectures (e.g., Snowflake, Databricks, Delta Lake, BigQuery, Redshift) to support analytics, data science, and BI workloads.
– Develop, automate, and monitor Airflow DAGs/jobs on cloud or Kubernetes, following robust deployment and operational practices (CI/CD, containerization, infra-as-code).
– Write performant, production-grade SQL for complex data aggregation, transformation, and analytics tasks.
– Ensure data quality, consistency, and governance across the stack, implementing processes for validation, cleansing, anomaly detection, and reconciliation.
– Collaborate with Data Scientists, Analysts, and DevOps engineers to ingest, structure, and expose structured, semi-structured, and unstructured data for diverse use-cases.
– Contribute to data modeling, schema design, data partitioning strategies, and ensure adherence to best practices for performance and cost optimization.
– Implement, document, and extend data lineage, cataloging, and observability through tools such as AWS Glue, Azure Purview, Amundsen, or open-source technologies.
– Apply and enforce data security, privacy, and compliance requirements (e.g., access control, data masking, retention policies, GDPR/CCPA).
– Take ownership of end-to-end data pipeline lifecycle: design, development, code reviews, testing, deployment, operational monitoring, and maintenance/troubleshooting.
– Contribute to frameworks, reusable modules, and automation to improve development efficiency and maintainability of the codebase.
– Stay abreast of industry trends and emerging technologies, participating in code reviews, technical discussions, and peer mentoring as needed.
SKILLS & EXPERIENCE:
– Experience in one amongst GCP, AWS or Azure cloud platform
– Proficiency with Spark (Python or Scala), SQL, and data pipeline orchestration (Airflow, Prefect, Luigi, or similar).
– Experience with cloud data ecosystems (AWS, GCP, Azure) and cloud-native services for data processing (Glue, Dataflow, Dataproc, EMR, HDInsight, Synapse, etc.).
– Hands-on development skills in at least one programming language (Python, Scala, or Java preferred); solid knowledge of software engineering best practices (version control, testing, modularity).
– Deep understanding of batch and streaming architectures (Kafka, Kinesis, Pub/Sub, Flink, Structured Streaming, Spark Streaming).
– Expertise in data warehouse/Lakehouse solutions (Snowflake, Databricks, Delta Lake, BigQuery, Redshift, Synapse) and storage formats (Parquet, ORC, Delta, Iceberg, Avro).
– Strong SQL development skills for ETL, analytics, and performance optimization.
– Familiarity with Kubernetes (K8s), containerization (Docker), and deploying data pipelines in distributed/cloud-native environments.
– Experience with data quality frameworks (Great Expectations, Deequ, or custom validation), monitoring/observability tools, and automated testing.
– Working knowledge of data modeling (star/snowflake, normalized, denormalized) and metadata/catalog management.
– Understanding of data security, privacy, and regulatory compliance (access management, PII masking, auditing, GDPR/CCPA/HIPAA).
– Familiarity with BI or visualization tools (PowerBI, Tableau, Looker, etc.) is an advantage but not mandatory.
– Previous experience with data migrations, modernization, or refactoring legacy ETL processes to modern cloud architectures is a strong plus.
– Nice To Have : Exposure to open-source data tools (dbt, Delta Lake, Apache Iceberg, Amundsen, Great Expectations, etc.) and knowledge of DevOps/MLOps processes.
PROFESSIONAL ATTRIBUTES:
– Strong analytical and problem-solving skills; attention to detail and commitment to code quality and documentation.
– Ability to communicate technical designs and issues effectively with team members and stakeholders.
– Proven self-starter, fast learner, and collaborative team player who thrives in dynamic, fast-paced environments.
– Passion for mentoring, sharing knowledge, and raising the technical bar for data engineering practices.
DESIRABLE EEXPERIENCE:
– Contributions to open source data engineering/tools communities.
– Implementing data cataloging, stewardship, and data democratization initiatives.
– Hands-on work with DataOps/DevOps pipelines for code and data.
– Knowledge of ML pipeline integration (feature stores, model serving, lineage/monitoring integration) is beneficial.
EDUCATIONAL QUALIFICATIONS:
– Bachelors or Masters degree in Computer Science, Data Engineering, Information Systems, or related field (or equivalent experience).
– Certifications in cloud platforms (AWS, GCP, Azure) and/or data engineering (AWS Data Analytics, GCP Data Engineer, Databricks).
– Experience working in an Agile environment with exposure to CI/CD, Git, Jira, Confluence, and code review processes.
– Prior work in highly regulated or large-scale enterprise data environments (finance, healthcare, Travel, Retail, CPG or similar) is a plus.
AI/ML Engineer
LOCATION : Pune, Maharashtra, India.
EXPERIENCE : 2-4 Years
ROLE DESCRIPTION :
We are looking for AI & Generative AI Developer who can work across the AI spectrum from classical machine learning models to cutting-edge Generative AI applications. The role demands strong experience in building ML models using regression, classification, and tree-based algorithms, along with hands-on exposure to LLMs and generative frameworks like GPT, Stable Diffusion, and LangChain.
KEY RESPONSIBILITIES :
Classical AI/ML :
– Design and implement supervised and unsupervised ML models including: Linear Regression, Logistic Regression, Decision Trees, Random Forest, XGBoost Naive Bayes, K-Means, SVM, PCA, etc.
– Preprocess and analyse structured/tabular datasets
– Evaluate models using metrics like accuracy, precision, recall, ROC-AUC, and RMSE
– Build predictive models , deploy them into production, and monitor performance
– Collaborate with business teams to translate requirements into ML use cases
Generative AI (GenAI) :
– Build and fine-tune LLMs (e.g., GPT, LLaMA, PaLM) for summarisation, Q&A, document generation, etc.
– Implement prompt engineering , RAG pipelines , and vector database integrations
– Use libraries like Hugging Face Transformers, LangChain, and LlamaIndex
– Develop APIs to expose GenAI models in real-time apps
– Optimise model inference using quantisation, batching, etc.
– Ensure safe, explainable, and bias-free output in alignment with AI ethics guidelines
Additional Skills :
-Strong programming skills in Python , with experience in NumPy, Pandas, Scikit-learn
– Proficiency in classical ML algorithms (regression, trees, naive Bayes, etc.)
– Experience with LLM frameworks like OpenAI API, Hugging Face, and LangChain
– Understanding of transformer architecture , NLP, embeddings, and tokenisation
– Familiarity with REST API development using FastAPI/Flask
– Exposure to cloud platforms (AWS/GCP/Azure) and Docker/Kubernetes
Preferred / Nice to Have :
– Experience with deep learning (TensorFlow, PyTorch)
– Exposure to image/audio/video generation using models like DALL E, Stable Diffusion, Whisper
– Familiarity with RAG , LLMOps , and vector stores (FAISS, Pinecone, Weaviate)
– Knowledge of MLOps pipelines , model monitoring , and CI/CD for ML
PROFESSIONAL ATTRIBUTES:
– Strong analytical and problem-solving skills; attention to detail and commitment to code quality and documentation.
– Ability to communicate technical designs and issues effectively with team members and stakeholders.
– Proven self-starter, fast learner, and collaborative team player who thrives in dynamic, fast-paced environments.
– Passion for mentoring, sharing knowledge, and raising the technical bar for data engineering practices.
DESIRABLE EEXPERIENCE:
– Contributions to open source data engineering/tools communities.
– Implementing data cataloging, stewardship, and data democratization initiatives.
– Hands-on work with DataOps/DevOps pipelines for code and data.
– Knowledge of ML pipeline integration (feature stores, model serving, lineage/monitoring integration) is beneficial.
EDUCATIONAL QUALIFICATIONS:
– Bachelors or Masters degree in Computer Science, Data Engineering, Information Systems, or related field (or equivalent experience).
– Certifications in AI/ML cloud platforms (AWS, GCP, Azure) and/or data engineering (AWS Data Analytics, GCP Data Engineer, Databricks).
– Experience working in an Agile environment with exposure to CI/CD, Git, Jira, Confluence, and code review processes.
– Prior work in highly regulated or large-scale enterprise data environments (finance, healthcare, Travel, Retail, CPG or similar) is a plus.
Would you like to start a project with us?
Connect to get best in class consulting, solution design and implementation experience.