SEO
1555484400

Brief Overview of Netpeak Spider 3.2: JavaScript Rendering and Express Audit in PDF

Folks, Netpeak Software team is happy to present you a brand-new Netpeak Spider 3.2 with more awesome features for SEO specialists. In this post, I’ll tell you about the new functionality and other changes in the tool.

1. JavaScript Rendering

We are glad to announce that we have implemented one of the most long-awaited features in Netpeak Spider – JavaScript rendering.

Nowadays more and more sites use JS frameworks to display content. It’s impossible to crawl such content not executing the JS scripts. That’s why we’ve added JavaScript rendering in Netpeak Spider 3.2 so that now you can crawl websites that are using both CSR (client-side rendering) and SSR (server-side rendering).

1.1. JavaScript Rendering in Netpeak Spider

To implement JS rendering in the tool, we’ve used one of the latest versions of the Chromium browser, which serves as the basis of the world-famous Google Chrome. Googlebot uses an older version – Chrome 41, which is not supporting several modern JavaScript features. That’s why JavaScript execution in Netpeak Spider is similar to the search robot’s behavior but not identical.

To start crawling with JavaScript rendering:

  1. Go to the ‘General’ tab of crawling settings.
  2. Tick the 'Enable JavaScript rendering and set AJAX timeout, s' checkbox.

The ‘AJAX timeout’ parameter defines the time allotted for execution of scripts after the page load. By default, its value is set to 2 seconds which, in most cases, is enough for a full JS execution. However, there might be cases when 2 seconds is not enough. That’s why you can set a custom value yourself.

Let’s take a look at the main peculiarities of JavaScript rendering in Netpeak Spider:

  • JS will only be executed for compliant HTML pages (returning the 200 OK status code).
  • The User Agent chosen in crawling settings is used.
  • Basic authentication is supported.
  • Crawling is limited to 25 threads. You can still set 100 simultaneous threads, but only 25 compliant HTML pages will be rendered.
  • Requests to analytics services (Google Analytics) are blocked in order not to distort site analytics.
  • Cookies are considered regardless of the settings on the 'Advanced' tab.
  • Iframes and images aren’t loaded.
  • List of proxies is supported.

We highly recommend crawling sites using JavaScript rendering only when it’s necessary. Remember that this process increases crawling duration and resource consumption.

2. Express Audit of the Optimization Quality (PDF)

We had set a goal to make data visualization in our desktop tool as advanced as in the most advanced online tools. That’s why we’ve created a brand-new PDF report with an express audit of the optimization quality.

The report itself is an extended version of the program dashboard with detailed data for a site audit. It contains only data found during crawling, so you won’t see empty tables if any data wasn’t found. Also, instead of endless URL lists, you will see illustrative examples for data analysis.

The report primarily contains data useful for SEO teams. Though, it will also help sales teams to quickly evaluate the strengths and weaknesses of a project. You can add your own recommendations to the report and send it to your client or colleagues for further actions.

You can save the report in two clicks: open the ‘Export’ menu and chose the first option.

The structure of the express audit is based on the data from the ‘All results’ table.

The report file contains the following sections:

  1. Title page. Here you will see the screenshot of a homepage and the domain name of a crawled website.
  2. Overview. It contains numerical overview of the data in the report, the content type of internal and external URLs, and the main hosts.
  3. URL structure. It shows the main hosts, their segments, and data on the root documents.
  4. Status codes. Here you should pay your attention to unavailable pages returning a 4xx and higher status codes.
  5. Crawling and indexing. This section shows data on settings and instructions that affect website crawling and indexing.
  6. Click and URL depth. It will help you identify pages with click and URL depth bigger than 4.
  7. Load speed. This report contains two important indicators: server response time of internal and external URLs.
  8. HTTP/HTTPS protocols. With the help of this report, you can easily detect resources with mixed content.
  9. Content optimization. It shows such data:
    • Presence and uniqueness of the Title, Description, H1
    • The number of characters and words on a page
    • Image size
  10. Issues. This report displays the number of pages containing issues of different severity, top major issues hampering your SEO, and the full list of all found issues with examples.
  11. Terms. Here you will see descriptions of all important terms used in this report.
  12. Settings. In the last section, you can see all the settings and parameters used to create this report.

We have recorded a short video to showcase you our express audit. We recommend watching it in a full-screen mode to get all the details.

3. Detailed Issue Description with Export

To ease data perception in Netpeak Spider, all issues are divided into groups according to their severity and all parameters are briefly described. In this update, we took a step further and added an enhanced description of each issue.

If you click on any issue from the sidebar, the ‘Info’ panel will show you such data:

  • the threat the issue can cause
  • how to fix it
  • useful links for a deeper understanding of the problem

Now it should be much easier for newbies to understand how to fix SEO issues on a website. To make this feature useful for experienced specialists, we’ve implemented an opportunity to export the overview of all found issues with their detailed descriptions.

This report is called 'Issue overview + descriptions'. It can be found in the 'Export' → 'Issue Reports' tab, and in the following bulk exports:

  • 'Main reports set'
  • 'All issues'
  • 'All available reports (main + XL)'

We hope it will help you to describe the game plan to your client and set a technical specification to developers.

4. Changes in Issues and Parameters

We’ve changed the names of a couple of issues:

  • Broken Links → Broken Pages
  • Duplicate Canonicals → Identical Canonical URLs

We’ve also adjusted the severity of the following issues.

Warnings → notices:

  • Multiple H1

Errors → notices:

  • Bad Base Tag Format
  • Max URL Length

Warnings → errors:

  • 5xx Error Pages: Server Error
  • Canonical Chain
  • Duplicate H1
  • Bad AMP HTML Format

Determination logic and sorting of issues were also changed:

  • The 'Bad Base Tag Format' issue: if this tag contained a relative URL it was considered an issue. Now the issue is shown if the href attribute contains a URL with bad format.
  • The 'Canonical URL' parameter: by default, only an absolute URL is now considered in the canonical instruction, according to Google guidelines. If a relative one is set → the (NULL) value will be shown in the table. However, you can enable crawling of relative canonical URLs in the ‘Advanced’ tab of crawling settings.
  • We’ve changed the issue sorting: the most important and widespread issues now hold pride of place.

5. Other Changes

  • Since we used the .NET 4.5.2 framework to implement JavaScript rendering, the new Netpeak Spider works only on the Windows operating system not lower than the 7 SP1 version (older versions do not support this framework).
  • The algorithm determining internal addresses for a list of URLs has been changed → the addresses with the same domain are now considered internal. If at least one URL contains another domain – the tool will consider it external. Previously, during the crawling of a URL list, all links were considered external.
  • The ‘Default’ parameters template is improved.
  • Working with robots.txt has been optimized → there’s only one request to the file for each host at the start of crawling. Earlier, when you set many threads, lots of requests used to be sent to one robots.txt.
  • Changes in results sorting → now the sorting is saved only for one session.
  • New naming logic of saved projects and reports → names now contain the host of the initial URL or the first paragraph of a table.
  • The 'Allow cookies' setting is enabled by default → it allows preventing issues with website crawling due to neglected cookies.
  • Improved notifications → now the window is displayed for 60 seconds and does not make Netpeak Spider window active.

In a Nutshell

We’ve added new features in the 3.2 version of Netpeak Spider so you can do even more tasks using our crawler. The tool now renders JavaScript, creates an express audit of the optimization quality in PDF, provides a detailed description of each issue, and allows to export it for further analysis.

Check all the details on 50 improvements of the brand-new Netpeak Spider in the release post on Netpeak Software blog.

27
0
Found a mistake? Select it and press Ctrl + Enter