• Welcome
  • News
  • Team
    • Team
    • Alumni
    • Gallery
  • Research
    • Focus
    • Projects
    • Publications
  • Teaching
    • Current Courses
    • Upcoming Courses
    • Open Theses
  • Collaborate

Jonas Wilinski

Research Assistant & Doctoral Student

Team

© Anne Gärtner

Jonas Wilinski
  • Building Q Room 1.032
  • +49 40 42878-4738
  • jonas.wilinski@tuhh.de

Biography

My research interests span across the wide field of AI and Machine Learning in combination with managerial and cultural implications.

Research Interests

  • Machine Learning and AI especially in combination with NLP and Vision
  • Crowdsourced knowledge and innovation
  • Social Engineering

Appointments & Education

  • PhD in Management & Engineering
    Hamburg University of Technology, Germany
    2022 - current
  • CTO & Product Owner
    Startup “Cargofaces”, Germany
    2021 - 2022
  • Master of Science in International Management and Engineering
    Hamburg University of Technology, Germany
    2019 - 2021
  • Working Student
    Hamburg University of Technology, Germany
    2020 - 2021
  • Project Manager
    Dr. Ing. h. c. F. Porsche AG, Germany
    2018 - 2019
  • Internship in the automotive industry
    Dr. Ing. h. c. F. Porsche AG, Germany
    2017 - 2018
  • Bachelor of Science in Electrical Engineering and Management
    Kiel University, Germany
    2014 - 2018
  • Working Student
    Kiel University, Germany
    2016 - 2017

Selected Publications

The Science Data Lake: A Unified Open Infrastructure Integrating 293 Million Papers Across Eight Scholarly Sources with Embedding-Based Ontology Alignment Scholarly data are largely fragmented across siloed databases with divergent metadata and missing linkages among them. We present the Science Data Lake, a locally-deployable infrastructure built on DuckDB and simple Parquet files that unifies eight open sources - Semantic Scholar, OpenAlex, SciSciNet, Papers with Code, Retraction Watch, Reliance on Science, a preprint-to-published mapping, and Crossref - via DOI normalization while preserving source-level schemas. The resource comprises approximately 960GB of Parquet files spanning ~293 million uniquely identifiable papers across ~22 schemas and ~153 SQL views. An embedding-based ontology alignment using BGE-large sentence embeddings maps 4,516 OpenAlex topics to 13 scientific ontologies (~1.3 million terms), yielding 16,150 mappings covering 99.8% of topics (≥ 0.65 threshold) with F1 = 0.77 at the recommended ≥ 0.85 operating point, outperforming TF-IDF, BM25, and Jaro-Winkler baselines on a 300-pair gold-standard evaluation. We validate through 10 automated checks, cross-source citation agreement analysis (pairwise Pearson r = 0.76 - 0.87), and stratified manual annotation. Four vignettes demonstrate cross-source analyses infeasible with any single database. The resource is open source, deployable on a single drive or queryable remotely via HuggingFace, and includes structured documentation suitable for large language model (LLM) based research agents. 2026 Working Paper Jonas Wilinski
Jonas Wilinski
The Science Data Lake: A Unified Open Infrastructure Integrating 293 Million Papers Across Eight Scholarly Sources with Embedding-Based Ontology Alignment
arXiv (2026)

DOI

PDF

Code Dataset

Working Paper

Scholarly data are largely fragmented across siloed databases with divergent metadata and missing linkages among them. We present the Science Data Lake, a locally-deployable infrastructure built on DuckDB and simple Parquet files that unifies eight open sources - Semantic Scholar, OpenAlex, SciSciNet, Papers with Code, Retraction Watch, Reliance on Science, a preprint-to-published mapping, and Crossref - via DOI normalization while preserving source-level schemas. The resource comprises approximately 960GB of Parquet files spanning ~293 million uniquely identifiable papers across ~22 schemas and ~153 SQL views. An embedding-based ontology alignment using BGE-large sentence embeddings maps 4,516 OpenAlex topics to 13 scientific ontologies (~1.3 million terms), yielding 16,150 mappings covering 99.8% of topics (≥ 0.65 threshold) with F1 = 0.77 at the recommended ≥ 0.85 operating point, outperforming TF-IDF, BM25, and Jaro-Winkler baselines on a 300-pair gold-standard evaluation. We validate through 10 automated checks, cross-source citation agreement analysis (pairwise Pearson r = 0.76 - 0.87), and stratified manual annotation. Four vignettes demonstrate cross-source analyses infeasible with any single database. The resource is open source, deployable on a single drive or queryable remotely via HuggingFace, and includes structured documentation suitable for large language model (LLM) based research agents.

Scholarly DataData InfrastructureOntology AlignmentOpen Science

How Founders Evaluate VCs: A GPT-Based Extraction of Value-Criteria from Online VC Reviews Academy of Management Proceedings 2025(1), 18846 2025 Conference Paper Olaf Specht, Jan H. Wilinski, Julius C. Thiesen, Christoph Ihl
Oliver Specht, Jonas Wilinski, Jürgen Christopher Thiesen, Christoph Ihl
How Founders Evaluate VCs: A GPT-Based Extraction of Value-Criteria from Online VC Reviews
Academy of Management Proceedings 2025(1), 18846 (2025)

DOI

Conference Paper

Venture Capital (VC) investments positively impact startup success, enhancing operational performance through factors like collaboration and value-added services. While research on investment decisions primarily focuses on investors’ selection criteria and decision-making processes, our study addresses the gap in founders’ perspective. Using Generative Pre-Trained Transformers (GPT) for text classification on a dataset of 8,561 online VC reviews, we extract 9,229 unique value-criteria from founders’ perspectives. A text-embedding cluster method categorizes these criteria into 26 categories. By analyzing additional startup lifecycle data, we determine which value-criteria are crucial at different startup stages. Our findings reveal that investors’ “general social skills” are the most important value-criteria across all startup stages, while more mature startups prioritize more self-serving criteria focused on growth and long-term relationships. Additionally, we observe that founders mostly fulfill the value-criteria by investors, with “general advice” being particularly well-executed.

Venture CapitalFoundersNLPGPT

No matching items

TU Hamburg

 

TU Hamburg

TUHH Institute of Entrepreneurship
Prof. Dr. Christoph Ihl
Am Irrgarten 3
21073 Hamburg
Contact

:   startup.engineer@tuhh.de
:   +49 (0)40 42878-3226
:   LinkedIn
:   Directions
Links    Data Privacy

   Imprint
Built with at