SEO
1470291991

Netpeak Spider 2.1 review: classification of issues, parameters selection, new results arrangement logic

About a month ago we announced the release of new Netpeak Software products → Netpeak Spider 2.0 and Netpeak Checker 2.0. You can always turn to that post if you haven’t read it yet to learn the full sequence of events. And now we are ready to present you a new version of the product – Netpeak Spider 2.1. I have been using 2.1 version in beta-testing mode over the past few weeks, and, frankly speaking, I wouldn't like to come back even to 2.0 version. Once you're spoiled, there's no way back! :)

Netpeak Spider is becoming a real ‘machine for search engine optimization’, so we thought: why not call it an 'SEO Terminator'.

Meet Netpeak Spider 2.1 – a program aimed for detection and elimination of on-page SEO issues. We want Aug 4, 2016 to remain in your memory as a ‘Crawling Day’.

1. Classification of issues

The detection of more than 50 types of issues was realized in the new version, and we prioritized them as follows:

  • Error → critical issues
  • Warning → important, but not critical issues
  • Notice → issues you should pay attention to

Now, on the right side of the program you can see the ‘Issues’ panel – it is the place where all the issues found during the crawling are presented. Hardly have we analyzed 500 pages of the Amazon.com website when 37 problems were discovered prioritized by their severity

The issues list will be constantly enlarged and modified. However, now it looks like that:

Issue Description
Errors
Duplicate Pages* Indicates all pages that have the same page hash output value. URLs in this report are grouped by page hash
Duplicate Body Content* Indicates all pages that have the same page hash output value of the <body> section. URLs in this report are grouped by page body hash
Duplicate Titles* Indicates all pages with title tags that appear on more than one page of the crawled website. URLs in this report are grouped by title tag
Missing or Empty Title Indicates all pages without the title tag or with the empty one
Duplicate Descriptions* Indicates all pages with meta description tags that appear on more than one page of the crawled website. URLs in this report are grouped by meta description tag
Missing or Empty Description Indicates all pages without meta description tag or with the empty one
4xx Error Pages: Client Error Indicates all pages that return 4xx HTTP status code
Redirect to 4xx Error Page Indicates all pages that redirect to 4xx error pages like 404 Not Found Error
Endless Redirect Indicates all pages that are redirecting to themselves and thereby generate infinite redirect loop
Max Redirections Indicates all pages that redirect more than 4 times (by default). Notice that you can change the maximum number of redirects in the 'Restriction' tab of Crawling Settings
Connection Error Indicates all pages that failed to respond as a result of connection error
Max URL Length Indicates all pages with more than 2000 characters in URL
Missing Internal Links Indicates all pages with no internal links. Notice that such pages get link juice but do not pass it
Broken Images Indicates images that return 4xx-5xx status code. Notice that 'Images' Content Type should be checked in the 'General' tab of Crawling Settings to enable this issue detection
Warnings
Multiple Titles Indicates all pages with more than one title tag
Multiple Descriptions Indicates all pages with more than one meta description tag
Missing or Empty h1 Indicates all pages without h1 header tag or with the empty one
Multiple h1 Indicates all pages with more than one h1 header tag
Duplicate h1* Indicates all pages with h1 header tags that appear on more than one page of the crawled website. URLs in this report are grouped by h1 header tag value
Duplicate Canonical URLs* Indicates all pages with Canonical URLs that appear on more than one page of the crawled website. URLs in this report are grouped by Canonical URL
Min Content Size Indicates all pages with less than 500 characters in the <body> section (excluding HTML tags)
3xx Redirected Pages Indicates all pages that return 3xx redirection status code
Non-301 Redirects Indicates all pages that return redirection status code different from 301 (permanent redirect)
Redirect Chain Indicates all pages that redirect more than 1 time
Meta Refresh Redirected Indicates all pages with redirect in <meta http-equip="refresh"> tag in the <head> section
Blocked by Robots.txt Indicates all pages that are disallowed in robots.txt file
Blocked by Meta Robots Indicates all pages that contain <meta name="robots" content="noindex"> directive in the <head> section
Blocked by X-Robots-Tag Indicates all pages that contain 'noindex' directive in X-Robots-Tag of the HTTP header response
Internal Nofollowed Links Indicates all pages that contain internal links with rel="nofollow" attribute
Missing Images ALT Attributes Indicates all pages that contain images without the alt attribute. To view the report, please click 'Current Table Summary' button, choose 'Images' and set the appropriate filter (Include → URLs with issue → Missing Images ALT Attributes)
Max Image Size Indicates images which size exceeds 100 kBs. Notice that 'Images' box should be checked in the 'General' tab of Crawling Settings to enable this issue detection
5xx Error Pages: Server Error Indicates all pages that return 5xx HTTP status code
Long Server Response Time Indicates all pages with the response time of more than 500 ms
Other Failed URLs Indicates all pages that failed to respond as a result of other unknown errors
Notices
Same Title and h1 Indicates all pages that have identical title and h1 header tags
Max Title Length Indicates all pages with the title tag of more than 70 characters
Short Title Indicates all pages with the title tag of less than 10 characters
Max Description Length Indicates all pages with meta description tag of more than 160 characters
Short Description Indicates all pages with meta description tag of less than 50 characters
Max h1 Length Indicates all pages with h1 header tag of more than 65 characters
Max HTML Size Indicates all pages with more than 200k characters in the <html> section (including HTML tags)
Max Content Size Indicates all pages with more than 50k characters in the <body> section (excluding HTML tags)
Min Text/HTML Ratio Indicates all pages with less than 10 percent of the text to HTML ratio
Nofollowed by Meta Robots Indicates all pages that contain <meta name="robots" content="nofollow"> directive in the <head> section
Nofollowed by X-Robots-Tag Indicates all pages that contain 'nofollow' directive in X-Robots-Tag of the HTTP header response
Missing or Empty Canonical Tag Indicates all pages without Canonical URL or with the empty one
Different Page URL and Canonical URL Indicates all pages where the Canonical URL differs from the Page URL
Max Internal Links Indicates all pages with more than 100 internal links
Max External Links Indicates all pages with more than 10 external links
External Nofollowed Links Indicates all pages that contain external links with rel="nofollow" attribute
Missing or Empty Robots.txt File Indicates all URLs related to missing or empty robots.txt file. Notice that different subdomains can contain different robots.txt files

*Good news: all duplicates search is being carried out in real time, which means that you don't have to call a separate tool for it → choose the necessary parameters, start crawling and enjoy! :)

To better understand the issues, put your cursor over the precise issue and view a tooltip. Notice that all the issues, which are not found in the moment of crawling, are stored in the bottom part of the new panel, in the ‘Not Detected Issues’ block. Those issues, whose detection is off are stored even lower, in the ‘Disabled Issues’.

If you’re an SEO expert, please do let us know: which other issues should be detected by Netpeak Spider?

2. New parameters and option for their selection

The new version provides an option to choose the particular parameters for crawling. That directly influences the crawling speed and the RAM consumption. For example, such parameters as Links, Redirects, Headers, and Images are resource-intensive (it is mentioned in their settings) – try to switch them off if you don’t need them in the current crawling. Altogether Netpeak Spider 2.1 includes 44 parameters: for each of them you can see the description and issues which may occur

Altogether there are 24 new parameters added to Netpeak Spider 2.1. They are:

Parameter Description
General Parameters
Issues Number of all issues (errors, warnings, and notices) found on the target URL
X-Robots-Tag Instructions Content of the 'X-Robots-Tag' in HTTP response header: contains instructions for search engine robots and is similar to Meta Robots tag in the <head> section
Response Time Time (in milliseconds) taken for a website server to respond to a user's or visitor's request. It is the same as Time To First Byte (TTFB)
Content Download Time Time (in milliseconds) taken for a website server to return an HTML code of the page
Redirect Target URL Target URL of single redirect or redirect chain if it exists
Content-Length Content of the field 'Content-Length' in HTTP response headers; used to indicate the response body length in octets (8-bit bytes)
Content-Encoding Content of the field 'Content-Encoding' in HTTP response headers; used to indicate the type of data encoding
Parameters in <head> Tags
Meta Refresh Content of the <meta http-equiv="refresh"> tag in the <head> section of the document.
Rel Next/Prev URL Content of <link rel="next" /> and <link rel="prev" /> tags, used to indicate the relationship between component URLs in paginated series
Content Parameters
h1 Value Content of the first non-empty <h1> tag on the target URL
h1 Length Number of characters in the first non-empty <h1> tag on the target URL
h2-h6 Headers Number, value and length of h2-h6 headers on the target URL: these parameters are disabled by default, however you can set their analysis if needed
HTML Size Number of characters in the <html> section of the target page including HTML tags
Content Size Number of characters (including spaces) in the <body> section of the target page excluding HTML tags
Text/HTML Ratio Percentage of the text content on the target page, rounded to the nearest integer
Characters Number of characters (excluding spaces) in the <body> section of the target page excluding HTML tags
Words Number of words in the <body> section of the target page
Characters in <p> Number of characters (excluding spaces) in <p> </p> tags in the <body> section of the target page
Words in <p> Number of words in <p> </p> tags in the <body> section of the target page
Page Body Hash Unique key of the page <body> section calculated using SHA1 algorithm
Images Number of images found in <img> tags on the target page. Also you can find images alt attributes and URL source view, linking to the images.

All the issues are directly connected to the parameters where such issues can be detected. For instance, to check whether the <title> tags on the website are implemented correctly, you need to select Title parameter in the ‘Parameters’ tab of the crawling settings.

3. New logic of working with the results

Such a big amount of information has to be included in this section, that we had to resort to lists inside the lists :) So, let’s go.

3.1. Completely new results table

We’ve integrated completely new results table into Netpeak Spider 2.1, and we hope you’ll enjoy the features listed below:

Speed performance

It doesn’t actually matter how many results you have in the new table – one hundred or one million. You’ll be really surprised by the table response time, sometimes even doubting whether you managed to scroll to the right place so quickly :) In short, we did our best to provide you with better user experience and we’d be more than happy to hear your feedback.

Possibilities

✔ Grouping

Now you can group data by any parameter in any table. This will let you find new ways of looking at the crawling results. For instance, you can group the results by Status Code parameter and define which status code is more common for certain type of pages: On the screenshot, you can observe how the grouping by status code works → you can see that this website has some pages with ‘bad behavior’ :) Notice that grouping is possible not only by one column but by several columns too. Imagine the insights you may get having set right combinations.

✔ Columns on/off

If you click on any column name with the right mouse button you will see a convenient panel: there you’ll have an opportunity to set any column’s view which is enabled in the ‘Parameters’ tab of crawling settings: A panel where you can make the parameters view in current table on or off Be aware that export is influenced by the settings, so the export file will include all the results the view of which is on.

✔ Freezing columns

Now you can freeze any suitable number of rows, however ‘Number’ and ‘URL’ columns will be frozen by default. In the future updates saving the column width, order and freezing are to be implemented. But, unfortunately, now these settings are saved only within the current session (i.e. until you close the app).

3.2. New internal tables

Types of tables

✔ Issues info

We are proud to present a new additional table, where you can see all the issues found on the crawled website. Here, you can filter URLs by the issue type, its severity, and the parameters in which the issue was detected: An additional table with the problems which were spotted during the crawling of the selected URL or group of URLs

✔ Redirects

The updated table that shows all the redirects / chains of redirects found on the page(s): An additional table with redirects where status codes of the whole chain of redirects are considered

✔ Links

An absolutely new table, which contains really useful data about the link type, anchor, alt attribute (if the image is in <a href=""> tag), rel attribute, and even URL source view: An example of an additional table with all outgoing links from the crawled website

✔ h1-h6 headers

Each header has its own table: An example of an additional table with all outgoing links from the crawled website In case you need to analyze h2-h6 headers, do not forget to enable their crawling in the ‘Parameters’ tab of the crawling settings.

✔ Images

A new additional table, where you can find the data about all images found in the <img> tag on the page(s). An additional table with all the images found on the crawled website

New opportunities

✔ Current Table Summary

Another thing we are proud of – a unique feature, that allows you to open the necessary information (issues, links, redirects, h1 headers or images) for the pages in the current table.

You could try to filter the table by simply clicking on any issue type in the ‘Issue’ panel on the right side (e.g. 4xx Error Pages: Client Error, if any) and then select Current Table Summary → Incoming Links. In this case, you’ll get a complete list of broken links: After 2 minutes of crawling Amazon.com we’ve detected several links that return 4xx status code. And we can easily see all the broken links to this pages

✔ Export

Now every internal table has the option to be exported, just the same as the information in the main results tables.

✔ Filters

A lot of new parameters you can filter the data by were added, and also such summary filters as: ‘All parameters’ (in this case all the cells in the results table will be filtered) and ‘URLs with issue’ (available only if the appropriate parameters are selected). "Length" is one more parameter you can now filter by→ any cell in the table can be filtered by its length.

Try to combine the two last features: first filter and then press ‘Export’ → only the filtered results will be exported in this case.

✔ Ways to choose the data

For your convenience there are now three ways you can choose the data:

  • one URL → select any cell and call any internal table – thus, you’ll get the data only for the selected URL;
  • a group of URLs → choose several URLs (with the pressed left mouse button or using SHIFT/CTRL key) and call one of the internal tables – in this case, the data will be grouped for the selected URLs;
  • all URLs in the current table → click «Current Table Summary» button and choose any internal table; in this way you’ll get the data for all the URLs in the table.

Combining different ways of handling the data, you can work with the crawling results in the most efficient way. We’d be really happy to get your feedback, as far as we’ve put a lot of efforts into improving Netpeak Spider usability.

3.3. Highlighting the problems

Now, if a particular URL contains a mistake, only the URL cell and the parameters cell are highlighted, not the whole row. The color depends on the highest issue severity in this row or cell. You won’t see the entire website highlighted in red anymore. We took care to show you the issues in a more pleasant way We removed the opportunity to customize the table colors to show you the way we wanted to prioritize all the issues by their severity.

3.4. Better links distinguishing

Now all the links are divided into the exact types:

  • AHREF → the most common link from <a href=""> tag
  • IMG AHREF → so-called image links – images from the <img> tag inside the <a href=""> tag
  • IMG → links to the images from <img> tag
  • CSS → links to cascading style sheets
  • JavaScript → links to JS-files
  • Canonical → links from <link rel="canonical" /> tag in the <head> section
  • Redirect → if Netpeak Spider detects redirect to any URL, it’ll mark that there is a ‘Redirect’ type link to this page
  • LINK → to enable the detection of this type of links, you need to check crawling of the URLs from <link> tag in the ‘General’ tab of the crawling settings
  • Meta Refresh → for detecting such type of links you should also check to consider Meta Refresh in the ‘Advanced’ tab of the crawling settings

Besides, we added some more parameters to every internal table with links:

  • Alt → it is useful in case we deal with image links: anchor of such link will be an image alt attribute (if it exists) in <a href=""> tag
  • Rel → use it to detect link with rel="nofollow" and other values of this attribute (learn more)
  • URL Source View → a unique feature that shows you the original view of the link (just like crawler sees it), it’s helpful in case you need to find the exact link in the page source view

You can filter all the links by their type and see how the link anchor is formed (e.g. due to the picture ALT attribute inside <a> tag

3.5. Ways to view the data and interact with it

We have completely reorganized all the tables having added new logic: if you can see underlined URLs or numbers, it means you can interact with them. For example, if you choose the underlined URL and press ‘Space’ button or double click with the left mouse button, it will open in your default browser.

If you try to repeat these actions with the number of Incoming links – you’ll call the internal table, where all the Incoming links to this page/pages are viewed. Data you can interact with is underlined and marked with blue color. Try to double click on such cells in order to learn possible kinds of interaction

3.6. Other improvements in the tables

Real time work

You don’t have to stop the crawling in order to filter or export the data – now you can work with all the tables in real time, even during the crawling. For example, you can set data filtering in the ‘Filters’ table and start the crawling – after that, all the data will be automatically represented in the table in accordance with the filter set: that is extremely convenient if you are looking for some particular information on the website.

Sorting

We offer three types of sorting: descending (by default), ascending, and ‘no sorting’ when clicking on the same column for the third time.

Dividing the tables into separate ones

We have divided the tables into separate, independent ‘All’, ‘Issues’, and ‘Filters’ tables. Now changes of the columns’ order or width in one table won’t cause synchronization with the other tables.

Tooltips

If there is no enough place to view all the information in the table, you’ll see the ellipsis (...). Try to put your cursor over the cell with the ellipsis and you’ll immediately see the tooltip with the full data inside the cell (notice that there will be no tooltips in case all the data is viewed). It allows not to expand the columns each time you can not see all the data.

Hotkeys

You can handle the internal tables using hotkeys: F1-F8. Open the context menu by pressing the right mouse button on the table, and you’ll find there all the possible combinations.

4. Changes in crawling settings

4.1. New approach to handle the settings

Now default crawling settings are common for all projects. However, if you start crawling, the project settings will be saved, and the next time you select another project, you’ll see something like ‘Crawling settings of the current project and the selected one are different. Apply last crawling settings to the selected URL?’

Thereby you’ll be able to easily work both with specific settings for every project and as well as with common settings for all projects in case they are the same for different websites.

4.2. Settings comparison and autosave

Now the settings are being saved automatically every time you close the window or press ‘OK’ button. So don’t ever doubt, your changes in the different tabs of the settings will be surely saved.

To avoid definite inconveniences with the crawling settings, we have come up with the settings comparing logic → in case you have the same settings in various projects, you can switch between them without any pop-up windows. You will see them only if the settings differ.

4.3. New settings

General

Now you can disable the crawling of all MIME types, except for HTML files and redirects. It can be helpful when you don’t need to crawl, for instance, RSS-files or PDF documents.

Parameters

Welcome the new tab with all parameters for crawling and tooltips with their meaning and possible issues.

Advanced

  • New setting to consider instructions from X-Robots-Tag in HTTP response header if it exists
  • The logics of processing canonical URLs is improved → if you check to consider Canonical Link Element instructions, Netpeak Spider takes into account the content of this field in HTTP response header and gives it higher priority than the similar content in the <head> section of the page
  • New setting, which allows parsing all the pages that return 4xx errors: notice that ‘Retrieve 4xx error pages content’ setting is off by default

5. Export of the results

  • Export to Excel is advanced → now the results are exported as quickly as possible
  • Export to CSV is added → this type is a perfect fit when working with a big amount of data
  • Exported file name is now generated automatically, so you can at once see what table you worked in or what selection type you used in the internal tables
  • A separate dialog box with export setting is removed → the reason is to shorten the time to get to the final result (i.e. exported file). Besides, the previous option to choose parameters for export is transferred to the crawling settings.

6. New projects structure, application data storage and crawling

  • Renewed process of crawling, now its speed depends directly on the chosen parameters
  • Modified structure of saving the results → unfortunately, with this optimized structure we were unable to migrate old saved projects to the new structure, so we are sorry to inform you that previous saved results files are not compatible with the new version of Netpeak Spider 2.1
  • Compression of saved results that allowed decreasing file size by 4 times
  • Increased by 3 times crawling speed
  • An advanced system of ‘energy-consuming’ data storage on device hard disk that allows reducing RAM usage and thus crawl large websites

7. Other changings

  • Due to all the changes described above and a totally new program architecture, the PageRank calculation has become unavailable for a short time. The nearest Netpeak Spider 2.1.3 release will provide you with an optimized Internal PageRank calculation logic!
  • In-session filter saving for internal tables: Issues info, Redirects, Links, h1-h6 headers, Images
  • Status Code parameter is improved, in particular, its informativeness. Besides, all status codes are now supported by this parameter and it won’t return ‘429 429’ anymore
  • When uploading the crawling results, Crawled URLs and Crawling Duration parameters are being uploaded to the status bar to show the number of URLs and the time needed for crawling
  • Now the program is loading more smoothly

The future is not set!

It is you who should influence the Netpeak Spider development – leave your feedback, ask any questions bothering you, share your ideas or support those of other users:

A quick recap

Now, when the detailed review is terminated, let’s sum it up. Netpeak Spider has become smarter, more flexible and powerful. And the benefits you are getting from the new Netpeak Spider 2.1 are:

  • Detection of more than 50 on-page SEO issues
  • 24 new parameters and the option to select/set them
  • Absolutely new results table
  • Improved logics of handling the data, including new internal tables
  • Optimized export of the results and application architecture
  • More convenient way to operate the results and more flexible crawling settings

If you haven’t had the opportunity to try Netpeak Spider yet, we are pleased to offer you a 14-day free trial that grants full access to all the tool’s features. In case you and Netpeak Spider are old friends, don’t waste your time to give the updated version a try until the global tool testing ends on Aug 19, 2016.

Try Netpeak Software Products

I’m proud of the work carried out and would like to get your feedback and advice on how to improve the program more. Come with me if you wanna do an effective SEO! And I’ll be back... with the updates!


We suggest you check out our next post from the Netpeak Software series → a review on Netpeak Spider 2.1.1. This update features two new crawling modes (list of URLs and XML sitemap), external link analysis, viewing page source and HTTP headers, as well as five new issues we are now able to detect.

5
2
Found a mistake? Select it and press Ctrl + Enter