Supervised Label Prediction for Cosmetic Patents Using Machine Learning
AI
NLP
Predicting and automating the classification process for cosmetic patents using machine learning algorithms
Teaching
Tasks
This thesis focuses on analyzing a patent dataset to predict and automate the classification process, which is currently carried out manually. The goal is to identify and test different machine learning algorithms to improve prediction accuracy. Features are derived from patent codes (e.g. IPC, CPC) and text data (e.g. patent claims, full text). A second classification task targets specific information (e.g. cosmetic substances) within the independent claims.
Subtasks
- Preparation of patent data records and underlying text data
- Define and extract characteristics
- Training & testing various machine learning architectures
- Identify the best approach and fine tune hyperparameters
Expectations
- Strong analytical skills and passion for data
- Willingness to cooperate with R&D of a leading global skin care company
- Very good knowledge of English
- Team player with an international mindset
Ideally:
- Skills in R or Python
- Experience in processing natural language
- Experience in machine learning
