Query a public BigQuery dataset using SQL

Wednesday, January 29, 2025

Introduction

In this project, you will learn how to query a public dataset using Google BigQuery. BigQuery is a serverless, highly scalable, and cost-effective cloud data warehouse. By leveraging public datasets, you can explore large data sets without uploading your own data, making it a great way to learn SQL and data analysis in the cloud—free if you stay within BigQuery’s free tier limits.

Why Is It Useful?

Hands-on SQL Skills: You can practice real-world queries with large datasets.

No Data Upload Needed: Public datasets are already available, saving time and cost.

Scalable and Serverless: No infrastructure to manage, so you focus only on your queries.

Prerequisites

  • Google Cloud Account (Free Tier, No Credits Required)

Sign up at cloud.google.com if you don’t have one.
BigQuery offers a certain amount of free queries each month; ensure you stay within free limits.

  • Installed Google Cloud SDK (CLI Tools)

Includes the bq command-line tool for BigQuery.
Install it from Google Cloud SDK documentation.

  • Proper Permissions

You need Owner or Editor role in your Google Cloud project to run BigQuery queries.
Typically, the default project you create will allow you to use BigQuery.

  • BigQuery API Enabled

Ensure the BigQuery API is enabled in your project.
You can enable it via APIs & Services in the console or using the CLI command shown below.

Step 1: Select or Create a Project

  • Console

Go to Google Cloud Console.
In the top-left dropdown, select an existing project or create a new one (for example: my-bigquery-project).


  • CLI : gcloud config set project YOUR_PROJECT_ID

Step 2: Enable BigQuery API

  • Console

In the Console, go to APIs & Services > Library.
Search for BigQuery API, select it, and click Enable if it’s not already enabled.

  • CLI : gcloud services enable bigquery.googleapis.com

Step 3: Opening BigQuery

Console : In the left navigation menu, scroll down to Big Data or Analytics section and click BigQuery.
This opens the BigQuery console (also known as the BigQuery web UI).

CLI : The BigQuery CLI tool is called bq. You can verify it’s installed by running: bq version

Step 4: Query the Public Dataset

We’ll run a simple SQL query on the bigquery-public-data.samples.shakespeare table.

Console : In the BigQuery console, under Explorer, expand bigquery-public-data > samples.
Click on shakespeare. You’ll see table details and a Query button.
Click Query (or New Query at top).

Paste this SQL:

SELECT word, word_count
FROM `bigquery-public-data.samples.shakespeare`
WHERE word IN ('love', 'death', 'king')
ORDER BY word_count DESC
LIMIT 10;


Click Run. You’ll see the results in the console below.

CLI :
bq query --use_legacy_sql=false \
'SELECT word, word_count
FROM `bigquery-public-data.samples.shakespeare`
WHERE word IN ("love", "death", "king")
ORDER BY word_count DESC
LIMIT 10;'

Step 5 : Run Your Query

In that query editor, you can paste the SQL, for example:

SELECT word, word_count
FROM `bigquery-public-data.samples.shakespeare`
WHERE word IN ('love', 'death', 'king')
ORDER BY word_count DESC
LIMIT 10;

Then click “Run” at the top of the editor. The results will appear below once the query finishes.

Step 6 : Verifying and Testing the Project

  • Check Query ResultsIf you see rows for your words (“love”, “death”, “king”) and their word counts, your query worked successfully.
  • Confirm Free Tier UsageMake sure your processed data is within the free tier limit (usually 1 TB per month for queries). This example query is very small, so it should be well within the free limit.
  • Try Different Words
    Modify the WHERE clause to explore other words or run aggregate queries. Example:
    SELECT word, SUM(word_count) as total_count
    FROM `bigquery-public-data.samples.shakespeare`
    GROUP BY word
    ORDER BY total_count DESC
    LIMIT 5;

Conclusion

In this guide, we have:

  • Queried a public BigQuery dataset (shakespeare) without using any extra credits, staying within the free tier.

  • Learned how to do it via Google Cloud Console (GUI) and the CLI (bq).

  • Understood where to take screenshots for your documentation.

  • Explored common troubleshooting steps, from enabling APIs to verifying permissions.

What is Cloud Computing ?

Cloud computing delivers computing resources (servers, storage, databases, networking, and software) over the internet, allowing businesses to scale and pay only for what they use, eliminating the need for physical infrastructure.


  • AWS: The most popular cloud platform, offering scalable compute, storage, AI/ML, and networking services.
  • Azure: A strong enterprise cloud with hybrid capabilities and deep Microsoft product integration.
  • Google Cloud (GCP): Known for data analytics, machine learning, and open-source support.