Import Wikipedia Data to Google Sheets
- Part 1: Create your API Request URL
- Part 2: Pull Wikipedia API Data into Sheets
- Part 3: More Example API URLs
We’ll first follow the Wikimedia REST API documentation to pull in information about a random Wikipedia article, then show how to extend this to other API requests.
- Base URL: https://en.wikipedia.org/api/rest_v1
- Endpoint: /page/random/summary
Putting it together, we get the full API Request URL:
We can now enter our URL into API Connector and start importing Wikipedia data into Google Sheets.
- Open up Google Sheets and click Add-ons > API Connector > Create New API Request.
- In the Create Request interface, enter the Request URL we just created
- Under Headers, enter a key-value pair like this:
While this request will work without including a header, Wikipedia’s global API rules request that you set a unique User-Agent that allows them to contact you quickly. You may use an Email address or the URL to a contact page, like this:
- Create a new tab. You can call it whatever you like, but here we’ll call it ‘Wiki Random’. While still in that tab, click ‘Set’ to use that tab as your data destination.
- Name your request. Again we’ll call it ‘Wiki Random’
- Click Run and a moment later you’ll see information about a random article populate your Google Sheet. If you’d like more random articles, switch to Append mode, and hit Run several times.
Experiment with endpoints as described in the documentation here, as well as the documentation on pageview analytics here to see other types of Wikipedia responses. If you just want to get started, you can try out the following requests, one at a time:
- Get metadata and an abstract about a specific page:
- List of births on a specific day:
- Pageviews per day for in April 2020 for the Wikipedia article on Tiger King
- Top 1000 articles in March, 2020 (Tip: switch to the Compact report style to avoid timing out):
The article Create API Request Based on a Cell describes how you can point to a cell to dynamically change the date or endpoint in the URL, which can be very convenient for constructing requests.