AidData data set
The “AidData” data set was suggestion by Prof. Enrico Bertini as one that is good for use in Visualization class assignments. We will use it for a number of assignments in class.
- The AidData data set: (aiddata.xlsx 4.8mb)
- The “Reduced” AidData data set: (aiddata_reduced.csv 0.5mb)
The data set that we will use is a version provided by Prof. Bertini. I’ll refer to this as the “AidData” data set. As this version is still quite large, have provided a further reduced data set. We will be clear in assignments which version you should use.
The “original” data set is the AidData Core Research Release, Version 3.1. They provide a thorough description, and a much larger collection of rows and attributes. You may want to look at it to get more background, or to get details on transactions.
The data sets we will use, each row represents a financial transaction between two countries. The dataset contains the following attributes:
- Year: year of the commitment
- Donor: country providing the financial resource. There are 42 different donors. (you can view this as a categorical variable)
- Recipient: country or organization receiving the money. There are 45 different recipients (you can view this as a categorical variable)
- Commitment Amount: the total amount of financial resources provided
- Coalesced Purpose Name: the purpose of the transaction. While these are strings, you can view this as a categorical variable (although, there are 426 different categories in the data set)
The AidData data set has over 98000 rows (transactions).
The “Reduced” AidData data set contains aproximatly 6000 rows (transactions), which is admitedly still daunting to work with non-programatically. However, the cardinality of this dataset was also reduced so that it only contains transactions from the top 10 donors and top 10 recipients for the top 10 purposes in terms of overall dollar amounts and not including ‘Sector not specified’ and ‘Multisector’ for the purposes. This reduces the number of stories that can be told with the data, but it ensures that you only have 10 of each to work with, which is much more managable.
A bit more on the process of downsizing for those interested - the total ‘Commitment Amount’ for each donor, recipient, and purpose was calculated (ex. the ‘Commitment Amount’ for each of the donations made by the US was summed to get their total ‘Commitment Amount’), then the top 10 donors, recipients, and purposes were found in terms of this total commitment amount, finally the original dataset was subsetted, only preserving records which contained a donor, recipient, and purpose in the top 10 lists (in more verbose terms, a record was preserved, if one of the top 10 donors commited money to one of the top 10 recipients for use on one of the top 10 purposes, conversely, if a top 10 donor commited money to one of the top 10 recipients for a purpose that was not one of the top 10 purposes, that record was not preserved).