Format

all scrapy crawlers should create a json with the following format. The fieldnames are defined by the JobEntity of the YAWIK Project

{
   "jobs": [
       "id": Example-123/456,    // must not contain dots
       "title": "title of the job posting",
       "location": "location of the job posting",
       "company": "name of the hiring organization",
       "description":
       "reference": "reference of the job posting used by the hiring organization",
       "contactEmail": "email address for applications (if available)",
       "language": "language of the job posting",
       "link": "link to the detail page of the job posting",
       "datePublishStart": "date of the job posting (date format DD.MM.YYYY)",
       "datePublishEnd": "End date of the job posting (date format DD.MM.YYYY)",
       "logoRef": "link to a logo of the hiring organization",
       "linkApply": "link which references an application form",
       "classifications": {
           "professions": [
               "software-developer",
               "sales manager"
           ],
           "industries": [
               "banking",
               "IT"
           ],
           "employmentTypes": [
               "contract",
               "internship",
               "freelancer"
           ]
       },
       "templateValues":{
           "description": "<p>We're a good company<\/p>",
           "tasks":"<b>Your Tasks<\/b><ul><li>Task 1<\/li><li>Task2<\/li><\/ul>",
           "requirements":"<b>Qualifications<\/b><ul><li>requirement 1<\/li><li>requirement 2<\/li<<\/ul>",
           "benefits":"<b>We offer<\/b><ul><li>offer 1<\/li><li>offer 2<\/li><\/ul>",
           "html": "<p>complete html<\/p>"
       }
   ],[
       .....
   ]

}
field  
id unique identifier. Must not contain dots ‘.’
title title of the job posting
location location of the job posting
link link to the detail page of the job posting

The fields id, title and link are required. All other fields are optional. If the data can be crawled, put them into the described JSON format

The fields datePublishStart and datePublishEnd should be in the format DD.MM.YYYY

basis crawlers

A Basis crawler requires the fields required for a job list. These are:

  • id
  • title
  • link
  • location (if available in the overview)

A basic crawler is created to show that crawling is basically possible.

full crawler

A Full crawler should contain all data, which are a available using the job listing and the job detail page. Full crawlers are created when it is clear who pays for the work.