Supervised label prediction for cosmetic patents using machine learning

In cooperation with Beiersdorf

Tasks:

The focus of this thesis is on the analysis of a patent data set for predicting and automating the classification process, which is currently still carried out manually. The goal is to identify and test different machine learning algorithms in order to improve prediction accuracy. The features for identifying the patents come from patent codes (e.g. IPC, CPC) and text data (e.g. patent claims, full text). There will be a second classification task related to specific information (e.g. cosmetic substances) within the independent claims of these patents.

Subtasks:

  • Preparation of patent data records and underlying text data
  • Define and extract characteristics
  • Training & testing various machine learning architectures
  • Identify the best approach and fine tune hyperparameters

Expectations:

  • Strong analytical skills and passion for data
  • Willingness to cooperate with R&D of a leading global skin care company
  • Very good knowledge of English
  • Team player with an international mindset

Ideally:

  • Skills in R or Python
  • Experience in processing natural language
  • Experience in machine learning
Avatar
Prof. Dr. Christoph Ihl
Professor & Head of Institute

My research interests include cultural entrepreneurship, social networks and natural language processing.