Python Developer (Data Extraction)
vac

Python Developer (Data Extraction)

Remote
Full-time
GMT (London time zone working hours)

We’re looking for a highly detail-oriented Python Developer to join the team of our client — a technology company focused on extracting and structuring large volumes of information from publicly available online resources.

The company works with big data pipelines that selectively collect information from various public sources. This may involve web scraping, document processing, or integrations with available data interfaces.

The core of the work revolves around continuously extracting and structuring large volumes of data from diverse sources and formats.

Responsibilities

  • Develop and maintain Python-based data extraction pipelines for processing large volumes of documents
  • Write Python code to extract structured data from scanned and semi-structured documents, mostly containing financial information
  • Implement logic for data parsing, normalization, and validation
  • Handle thousands of documents with different formats and varying scan quality
  • Improve and optimize existing data extraction algorithms and workflows
  • Write and maintain automated tests using Pytest to ensure extraction accuracy
  • Create validation logic and test cases to maintain high data quality standards
  • Collaborate with the team to improve document processing and data extraction approaches

What this role is NOT

  • Not a web development role
  • Not focused on building APIs or API integrations
  • Not a typical backend product development position

This role is focused specifically on data extraction, processing, and validation.

What candidates need to be successful

  • 4+ years of professional experience with Python
  • Level of English — B2+
  • Strong experience writing clean, maintainable, and testable Python code
  • Experience developing data extraction, parsing, or scraping solutions
  • Hands-on experience working with semi-structured or unstructured data (e.g., PDFs, scanned documents, OCR outputs)
  • Strong skills in data validation, data quality checks, and analytical problem-solving
  • Experience designing test cases and automated tests using Pytest
  • Ability to work with large volumes of data and complex document formats
  • Exceptional attention to detail and accuracy, especially when working with financial or sensitive data
  • Experience debugging and improving data extraction logic and parsing algorithms
  • Familiarity with AWS Lambda and Step Functions
  • Experience working with document processing or OCR pipelines
  • Experience building data processing pipelines or ETL workflows
  • Familiarity with large-scale data processing or big data environments

If you have relevant experience and a strong attention to detail, we’d love to hear from you. Apply and let’s discuss the opportunity!

Apply for this position