|
| 1 | +## Scrape single-page in TypeScript template |
| 2 | + |
| 3 | +A template for scraping data from a single web page in TypeScript (Node.js). The URL of the web page is passed in via input, which is defined by the [input schema](https://docs.apify.com/platform/actors/development/input-schema). The template uses the [Axios client](https://axios-http.com/docs/intro) to get the HTML of the page and the [Cheerio library](https://cheerio.js.org/) to parse the data from it. The data are then stored in a [dataset](https://docs.apify.com/sdk/js/docs/guides/result-storage#dataset) where you can easily access them. |
| 4 | + |
| 5 | +The scraped data in this template are page headings but you can easily edit the code to scrape whatever you want from the page. |
| 6 | + |
| 7 | +## Included features |
| 8 | + |
| 9 | +- **[Apify SDK](https://docs.apify.com/sdk/js/)** - a toolkit for building [Actors](https://apify.com/actors) |
| 10 | +- **[Input schema](https://docs.apify.com/platform/actors/development/input-schema)** - define and easily validate a schema for your Actor's input |
| 11 | +- **[Dataset](https://docs.apify.com/sdk/js/docs/guides/result-storage#dataset)** - store structured data where each object stored has the same attributes |
| 12 | +- **[Axios client](https://axios-http.com/docs/intro)** - promise-based HTTP Client for Node.js and the browser |
| 13 | +- **[Cheerio](https://cheerio.js.org/)** - library for parsing and manipulating HTML and XML |
| 14 | + |
| 15 | +## How it works |
| 16 | + |
| 17 | +1. `Actor.getInput()` gets the input where the page URL is defined |
| 18 | +2. `axios.get(url)` fetches the page |
| 19 | +3. `cheerio.load(response.data)` loads the page data and enables parsing the headings |
| 20 | +4. This parses the headings from the page and here you can edit the code to parse whatever you need from the page |
| 21 | + |
| 22 | + ```javascript |
| 23 | + $("h1, h2, h3, h4, h5, h6").each((_i, element) => {...}); |
| 24 | + ``` |
| 25 | + |
| 26 | +5. `Actor.pushData(headings)` stores the headings in the dataset |
| 27 | + |
| 28 | +## Resources |
| 29 | + |
| 30 | +- [Web scraping in Node.js with Axios and Cheerio](https://blog.apify.com/web-scraping-with-axios-and-cheerio/) |
| 31 | +- [Web scraping with Cheerio in 2023](https://blog.apify.com/web-scraping-with-cheerio/) |
| 32 | +- [Video tutorial](https://www.youtube.com/watch?v=yTRHomGg9uQ) on building a scraper using CheerioCrawler |
| 33 | +- [Written tutorial](https://docs.apify.com/academy/web-scraping-for-beginners/challenge) on building a scraper using CheerioCrawler |
| 34 | +- [Integration with Zapier](https://apify.com/integrations), Make, Google Drive, and others |
| 35 | +- [Video guide on getting scraped data using Apify API](https://www.youtube.com/watch?v=ViYYDHSBAKM) |
| 36 | +- A short guide on how to build web scrapers using code templates: |
| 37 | + |
| 38 | +[web scraper template](https://www.youtube.com/watch?v=u-i-Korzf8w) |
| 39 | + |
| 40 | +## Getting started |
| 41 | + |
| 42 | +For complete information [see this article](https://docs.apify.com/platform/actors/development#build-actor-locally). To run the actor use the following command: |
| 43 | + |
| 44 | +```bash |
| 45 | +apify run |
| 46 | +``` |
| 47 | + |
| 48 | +## Deploy to Apify |
| 49 | + |
| 50 | +### Connect Git repository to Apify |
| 51 | + |
| 52 | +If you've created a Git repository for the project, you can easily connect to Apify: |
| 53 | + |
| 54 | +1. Go to [Actor creation page](https://console.apify.com/actors/new) |
| 55 | +2. Click on **Link Git Repository** button |
| 56 | + |
| 57 | +### Push project on your local machine to Apify |
| 58 | + |
| 59 | +You can also deploy the project on your local machine to Apify without the need for the Git repository. |
| 60 | + |
| 61 | +1. Log in to Apify. You will need to provide your [Apify API Token](https://console.apify.com/account/integrations) to complete this action. |
| 62 | + |
| 63 | + ```bash |
| 64 | + apify login |
| 65 | + ``` |
| 66 | + |
| 67 | +2. Deploy your Actor. This command will deploy and build the Actor on the Apify Platform. You can find your newly created Actor under [Actors -> My Actors](https://console.apify.com/actors?tab=my). |
| 68 | + |
| 69 | + ```bash |
| 70 | + apify push |
| 71 | + ``` |
| 72 | + |
| 73 | +## Documentation reference |
| 74 | + |
| 75 | +To learn more about Apify and Actors, take a look at the following resources: |
| 76 | + |
| 77 | +- [Apify SDK for JavaScript documentation](https://docs.apify.com/sdk/js) |
| 78 | +- [Apify SDK for Python documentation](https://docs.apify.com/sdk/python) |
| 79 | +- [Apify Platform documentation](https://docs.apify.com/platform) |
| 80 | +- [Join our developer community on Discord](https://discord.com/invite/jyEM2PRvMU) |
0 commit comments