Job Summary
Key Responsibilities
- Develop and maintain data ingestion pipelines for diverse sources.
- Optimize search functionalities and indexing strategies.
- Handle PDF processing at scale and ensure data quality.
- Build and maintain reliable REST APIs for seamless data access.
Who This Role Suits
- Candidates with a strong background in backend engineering.
- Professionals experienced in operating data-heavy systems.
- Individuals comfortable working with Node.js, TypeScript, and AWS services.
Tips to Apply
- Highlight your experience in designing and maintaining data pipelines.
- Demonstrate proficiency in full-text search systems and REST API development.
- Include familiarity with academic publishing formats, PDF processing, and web scraping in your application.
Similar Roles You Might Like
Full Job Description
Paperpile runs on data at scale, with a literature database of 250M+ academic papers and a growing body of user data accumulated over more than a decade. You'll work across the systems that ingest, process, store, and serve this data reliably: building pipelines, optimizing search, handling PDFs at scale, and exposing clean APIs.
Requirements
- Strong backend engineering background with experience building and operating data-heavy systems in production.
- Experience deploying and operating services on AWS.
- Experience designing and maintaining data ingestion pipelines handling messy, heterogeneous sources. Comfortable with web scraping and working with third-party data sources and APIs.
- Familiarity with Node.js and TypeScript. Itâs fine if you come from a different background, such as Java or Python, but you should be comfortable working in this environment.
- High standards for data quality. You think carefully about correctness, deduplication, and consistency.
- Solid understanding of full-text search systems including indexing strategy, relevance tuning, and query optimization.
- Proficient in building reliable REST APIs.
More useful experience
- Familiarity with academic publishing formats and data sources (PubMed, Crossref, arXivâ¦)
- Experience with PDF processing pipelines (extraction, transformation, storage and delivery at scale).
- Experience with LLM-based document processing or ML pipelines for extracting structured data from unstructured text.
- Large scale web crawling and scraping.
Compensation
- Base compensation â¬60,000ââ¬90,000 based on the level of your experience
- Bonus/equity program.
Please mention the word **NOURISH** and tag RNTEuNzcuMjE3LjEyNw== when applying to show you read the job post completely (#RNTEuNzcuMjE3LjEyNw==). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.
Skills
Frequently Asked Questions
Ready to Apply?
Click the button below to visit the job listing and submit your application on the employer's site.
Apply on remoteOK.comPrepare for This Role
How to Write a Winning Cover Letter
Stand out with a compelling cover letter tailored to this role
Interview Preparation Guide
Master common questions and make a great impression
Salary Negotiation Guide
Learn how to negotiate your salary and benefits
Salary Research Guide
Understand market rates for this position
More from Paperpile
Similar Roles
Tech Lead
Zensurance
Senior Data Engineer
Valon Mortgage
Senior NetSuite Developer - ERP Integration
yfood Labs GmbH
Senior Cloud Infrastructure Engineer - Leipzig (preferred) or Remote
eccenca GmbH
Senior Software Engineer Data & Cloud / Senior Data Engineer
ADEAL Systems GmbH
Working Student Data & Analytics - Schwerpunkt Data Engineering (all genders)
clockin GmbH
Software Engineer PHP (all genders)
Starface GmbH
TYPO3 Administration
Projektron GmbH