- 2 Sentence Pre-requisite:
- The Setup (One-time activity)
- 1 — Install Kaggle CLI
- 2 — API credentials
- Downloading Dataset via CLI
2 Sentence Pre-requisite:
Kaggle is a platform for data science where you can find competitions, datasets, and other’s solutions.
Some Kaggle datasets cannot be downloaded directly and can only be downloaded through Kaggle via it’s CLI.
The Setup (One-time activity)
1 — Install Kaggle CLI
To get started to Kaggle CLI you will need Python, open terminal and write
$ pip install kaggle
2 — API credentials
Once you have Kaggle installed, type kaggle to check it is installed and you will get an output similar to this
In the above line, you will see the path (highlighted) of where to put your kaggle.json file.
To get kaggle.json file go to:
In the API section, click Create New API Token. And copy it the path mentioned in the terminal output.
Type kaggle once again to check.
In my case, even after copying it was not working. I had the file in place but it did not have the right permissions so I had to type the exact command they gave me. And it started working.
Downloading Dataset via CLI
You can open kaggle help via
For getting info on competitions you can type
kaggle competitions download -h
whatever the Kaggle CLI command is, add -h to get help.
while you can explore Competitions, Datasets, and kernels via Kaggle, here I am going to only focus on downloading of datasets.
What I do is I explore competitions or datasets via Kaggle website.
Download Entire Dataset
To download the dataset, go to Data *subtab. *In API section you will find the exact command that you can copy to the terminal to download the entire dataset.
The syntax is like
kaggle competitions download <competition name>
Download Particular File From Dataset
As you can see, the size of the data is 34 GB which is huge.
So instead of downloading entire dataset, you can select which files to download.
You cannot provide download multiple files with a single command (as of 2019/Aug/10) so you will have to download it one by one using the following command.
kaggle competitions download -f <file-name> <competition-name>
Extract it and start using it.
I usually (plan to) put up a blog post every Saturday and create a YouTube video about it. My next post is a collection of Google Collab tips which will also include a way to download data from Kaggle into collab.
If there are any other useful tips/link/suggestion you would like to share, please put in the comment section below.
I respond to all my comments.
Thank you for reading so far. Have a good day.