Creating a crawler

Crawlers are ordered by customers. So if a new crawler should be created, we start with an Issue in the create-new-scrapy-crawler Project. The Subject should contain the name of the crawler. This name should match the regular expression /[a-z0-9-]+/. The name should be used as name of the repository, name of the json file and name of the scrapy crawler.

Subject: create crawler <spidername>

Developers are asked if they can do the task. Developers should assign themselve to the issue and start working. They should create a repository in the JobCrawler group. The new repository should be named <spidername>.


root@scrapy-runner:~# scrapy startproject spidername
 New Scrapy project 'spidername', using template directory '/usr/local/lib/python3.5/dist-packages/scrapy/templates/project', created in:

 You can start your first spider with:
     cd spidername
     scrapy genspider example

 root@scrapy-runner:~/spidername# scrapy genspider spidername
 Cannot create a spider with the same name as your project

Why spider cannot have the same name as the project? Please complete the example.

Please follow the PEP 8 Style Guide for Python Code

Please add a .gitlab-ci.yml to your code. Example

   - export PATH=$PATH:/usr/local/bin

This will test the code against the PEP 8 styleguide and execute the crawler on the test and deployment mashine by pushing changes to the master.

in addition create a .gitignore file:


# ignore .idea and build directory

crawlers have to be version controlled by git. The location in our gitlab is: