Dikshya's Portfolio
A brain dump of my experience, passions, and opinions
ERNST AND YOUNG LLP
EY Tax Alwin team
Chicago, IL
Job Role:
-
Leading the AI Tax team, building Machine Learning solutions for various clients in EY, both independently and in a supervisory role, and leading the Staff Data Scientists on my team. Helped clients save Millions in Tax Returns and reduce the workload as well as the manpower needed by introducing and implementing novel ML solutions for Tax related problems. Actively involved in identifying use- cases, handling communications with clients, strategizing for client-specific solutions, and presenting findings from experimentation using A/B testing.
-
Developing an end-to-end machine learning solution for a property tax use case focused on getting peer sets for assessing the net worth of property better through clustering.
-
Designing an anomaly detection system to detect and extract any tax exemption qualified entities in RD credit solutions by scraping unstructured data from the IRS website to categorize clients' tax data into official tax categories.
-
Designing and implementing Tax chatbots using BERT.
-
Conducted research and experimentations on building a deep learning framework to extract domain-specific vectors or building a
classification model, building customer NER models for extracting dates that don't have a set pattern.
-
Leading the initiative of creating the Alwin Toolkit process – panning out the overall architecture, executing, and optimizing code to make our approach to the ML problems efficient and robust, in terms of resources and streamlining the entire process using Azure DevOps
and Python.
Data Scientist - Manager
May 2022 – Continuing
Data Scientist - Senior
January 2020 – April 2022
AMAZON.COM
Payments R&D team
Seattle, WA, US
Applied Science Intern
May 2019 – August 2019
Job Role:
-
Analyzed the impact of customer purchasing trends to answer questions on seller’s future success, loan payment behaviors by developing cascade models to identify ‘hidden’ influencers and to eventually build a seller customer base.
-
Evaluation of these newly added features was done through feature importance testing using trained XGboost, random forest models and by analyzing its p-value with regularized regression which resulted in these features being part of the top 10 percent of the existing 30k features.
-
Built an evaluation pipeline to test if addition of new features improves the current model using A/B testing. This pipeline can be used for testing any new features that the team considers useful for their model.
TATA CONSULTANCY SERVICES
TCS Analytics and Insights Research Lab
Pune, MH, India
Data Science Developer
July 2016 - July 2018
Job Role:
-
Understanding the client’s requirements, business challenges and data, charting out the problem, defining a scope, executing the Proof-of-concept in a couple of months, and delivering it to the client in a detailed report and a presentation, after which they would decide if they would want us to scale it further.
-
Researched and designed text-to-vector models which helped in language understanding and generation applications such as building
appropriate domain-specific vocabulary, extracting named-entities and others.
-
Designed and implemented an end-to-end ML workflow which could tackle language complexities to extract and understand patient’s
information to develop an online doctor which could understand patient’s ailment based on varied features.
-
Designed the internal workflow to process unstructured patient information extracted from PDFs using OCR to organize and segment
the data into structured tables using PostgreSQL. Also stored and designed schema in MongoDB.
-
Worked on designing and creating APIs to extract data from sources like PubMed and Wikipedia for domain-specific language models.
INDIAN INSTITUTE OF TECHNOLOGY
NLP Research Lab, Computer Science Department
Kharagpur, WB, India
NLP Research Intern
May 2015 - October 2015
Job Role:
-
Developed and improved summarization algorithms in Python, aiming to implement them on the Standard Datasets and improving the then state-of-the-art accuracy.
- Worked on extracting, categorizing and predicting user-related information on the basis of their tweets. Similar to sentiment analysis, but the sentiment here was if the subject in question was drunk or not.