BY eric / Last Update: March 25, 2022
data export, table, web page, html, tag, div, data, data scraper, data miner, chrome, extension, productivity
In short, Mr. Table is a chrome browser extension that can help extract data from table(s) (e.g.
<table>*</table>) of the web pages, and the extracted data can be saved into either “csv” or “json” format.
We often have needs to collect data from the Internet for our work or our study, however, data presented in the web pages are often not in the format that we want. For example, most data in the web pages are presented using HTML tag
<div></div>, but we want data can be processed by our programs or our tools (e.g. Excel).
With “Mr. Table”, data can be converted from what you can see from the web pages to the format that we can actually use.
To get ‘Mr. Table’ Chrome extension, please visit this link.
Data often is presented using HTML table and related tags in the following ways:
- <table> for the table
- <thead> for the table column names
- <tr> for the table header row
- <th> for the table header cell
- <tbody> for the actual data
- <tr> for a data row
- <td> for a data cell
Table with Pure CSS
Data also often is presented using CSS, and data is grouped in <div> tags and styled with CSS classes, for Example:
<!-- Column header is the first item of list -->
First, visit the chrome store, and click on ‘Add to Chrome’.
Once you have have the extension installed you can pin ‘Mr. Table’ to the toolbar, then you can see the icon like this: .
After opening a web page containing the data you would like to extract, then click on the ‘Mr. Table’ icon, then you can see a popup window as follows:
There are a few options:
- You can select the output file type: CSV or JSON
- If you click on the “Advanced Options”, you can see a list of selectors:
- In some cases there might be many tables in a single page, so if you will like to only extract only the one you like, you can highlight that particular table, then click on “Export Selected”.
- To make things simpler, you can just click on “Export All”.
For example, I would like to extract the code change tables from ASX website for my algorithmic trading system, and I would go to this page.
After extracting all tables, you can see something like the following:
Then you can download each table individually.
Export Data from the
For those data presented in HTML table tag, you can simply using the default settings with preset table selector, column selector, cell selector, etc. as it can been found in the advanced options:
Export Data from CSS Table
Unfortunately, for such tables you can only extract them by specify the selectors manually.
Below is a table using pure CSS:
The following is the corresponding code in HTML:
<div class="table-cell">John Doe</div>
<div class="table-cell">John Smith</div>
So we can set the selector to:
- Table Selector: .table
- Header Row Selector: .table-header
- Header Cell Selector: .table-header-cell
- Data Row Selector: .table-row
- Data Cell Selector: .table-cell
After setting the proper selector in the advanced options as follows:
Click “Export All”, you will then see the above sample table is correctly exported:
We can work on a smart way to extract data from such tables if we can get more support.
you can contact the developer via: