Database of Pages and Links

After a website has been crawled, all its data are stored in the Site Visualizer project (*.SVZ file) as tables of one simple database, which has the following scheme:

Site Visualizer DB

The program uses PostgreSQL Database Management System (DBMS), which is being installed automatically along with Site Visualizer. All predefined reports are written using SQL.

To better understand how reports work, right-click the All Responses report and select the Edit... command in the context menu. The SQL text will be as follows:

SELECT response, response_code, COUNT(*) AS number
FROM pages
GROUP BY response, response_code
ORDER BY response_code

The above SQL code means that we want to see the numbers of each link response, ordered by response code. You can see a screenshot of the resulting table in the Reports topic.

Example of Duplicate H1 Headings

If you are new to SQL and don't want to learn it but need a specific report, you can ask us to do it for you. Otherwise, if you'd like to create custom reports yourself, here is an example showing how to do it.

For instance, let's find all pages that have more than one <H1> heading. First find the identifiers of all pages that contain duplicate <H1> headings:

SELECT page_id /* 1. Select IDs of all pages that have more than one H1 */
FROM headings
WHERE level=1
GROUP BY page_id
HAVING COUNT(*)>1

Then, for better understanding, append the headings text (wrap the above SQL query in a t0 subquery):

SELECT t0.page_id, t1.h_text FROM -- 2. Select text of all found H1 headings 
(
  SELECT page_id /* 1. Select IDs of all pages that have more than one H1 */
  FROM headings
  WHERE level=1
  GROUP BY page_id
  HAVING COUNT(*)>1
) AS t0 
LEFT JOIN headings AS t1 on t0.page_id=t1.page_id AND t1.level=1

Finally, to quickly find the page (to remove or fix duplicate headings), append the page's URL and title text from the Pages table. Wrap the above statement again (nested queries), and here is the resulting query:

SELECT url,title,h_text -- 3. Append URL and title   
FROM pages RIGHT JOIN
(
  SELECT t0.page_id, t1.h_text FROM -- 2. Select text of all found H1 headings 
  (
    SELECT page_id /* 1. Select IDs of all pages that have more than one H1 */
    FROM headings
    WHERE level=1
    GROUP BY page_id
    HAVING COUNT(*)>1
  ) AS t0 
  LEFT JOIN headings AS t1 on t0.page_id=t1.page_id AND t1.level=1
) AS t0
ON pages.id=t0.page_id

Copy the resulting SQL query (see above) and paste it into the text input box on the Database tab of the Site Visualizer main window. Then click the Execute button or press the F5 key:

Duplicate H1

The Result Set tab contains the table with the data obtained by executing the query. To export the data to an external application, such as MS Excel, use the Select All and Copy Rows commands.

To quickly sort a table, click a column title. The table will be sorted by that column in ascending order. Another click on the same column title will produce sorting in descending order.

The Messages tab contains possible SQL execution errors and their positions, and also success messages, such as the number of rows affected. Click Tools -> Save As Custom Report menu item to save the SQL text as a new Custom Report and use it regularly.

Technical Information

The app uses PostgreSQL v9.5.25, port number 4891. If you have PostgreSQL already installed and you'd like Site Visualizer to use your existing database system, or you'd like to integrate the app into your database, please contact us.

 Try Site Visualizer Now    Buy Site Visualizer Now