# How caching works in tidywikidatar

library(tidywikidatar)

In order to reduce load on Wikidata’s server and to speed up the processing of data, tidywikidatar makes extensive use of local caching.

## What data are cached locally

There are a few types of data that are cached locally:

• searches run with tw_search()
• data about an item, typically retrieved with tw_get() or tw_get_property()
• labels or description of properties, typically retrieved with tw_get_property_label() and tw_get_property_description()
• qualifiers of properties, typically retrieved with tw_get_qualifiers()
• data retrieved from (or about) Wikipedia pages, with tw_get_wikipedia(), and tw_get_wikipedia_page_links()

To reduce space used for local caching and speed up processing time, it is possible to store only labels and information available in a given language when relevant.

## Caching with SQLite

In tidywikidatar, it is possible to enable caching with:

tw_enable_cache()

If you do not include further parameters, by default tidywikidatar will use a local SQLite database for caching.

You can choose in which folder the SQLite database will be stored with tw_set_cache_folder(); if not already existing, you can create that folder with tw_create_cache_folder().

tw_set_cache_folder(path = fs::path(fs::path_home_r(),
"R",
"tw_data"))
tw_create_cache_folder()

## Caching with other database backends

Support for other database backends is now available. They can be accessed most easily using the following approach, having ensured that the relevant driver (and odbc package) have previously been installed:

tw_enable_cache(SQLite = FALSE)
tw_set_cache_db(driver = "MySQL",
host = "localhost",
port = 3306,
database = "tidywikidatar",

# for testing, consider running a local database e.g. with:
# docker run --name tidywikidatar_db -p 3306:3306 -e MYSQL_ROOT_PASSWORD=secret_root_password -e MYSQL_USER=secret_username -e MYSQL_PASSWORD=secret_password -e MYSQL_DATABASE=tidywikidatar mysql:latest

It is also technically possible to pass directly a connection generated with DBI::dbConnect() to each function.

## Name of tables in cached databases

Each database has a table for each language and type of content. For example, item information retrieved with tw_get(id = "Q180099", language = "en") will be stored in a table called tw_item_en.

The name of the table is unique and is generated by tw_get_cache_table_name(). For example:

tw_get_cache_table_name(type = "item", language = "en")
#> [1] "tw_item_en"