I have quite a large conflict dataset (71 million observations) with many variables and date (daily).
This is from the GDELT project for which the way the dataset is structured is that for each day, there is a target country and a source country of aggression. Namely, the first of January of 2000, many countries engaged in aggressive behaviour against others or themselves, and this dataset tracks this.
It looks like this:
clear
input long date_01 str18 source_01 str19 target_01 str4 cameocode_01
20000101 "AFG" "AFGGOV" "020"
20000101 "AFG" "AFGGOV" "0841"
20000101 "AFG" "ARE" "036"
20000101 "AFG" "CVL" "043"
20000101 "AFG" "GOV" "010"
20000101 "AFG" "GOV" "043"
20000101 "AFGGOV" "kasUAF" "0353"
20000101 "AFGGOV" "kasUAF" "084"
20000101 "AFG" "IGOUNO" "030"
20000101 "AFG" "IND" "042"
20000101 "AFG" "IND" "043"
end
What I would like to do is to isolate these events per country.
For instance, I would like to create a variable for the US where, for each date, I have all the times that the US was either a target or a source, and their respective cameo code. I have a considerable number of countries but only need a subset of them and I know their names in advance.
As you can see in the example, the first variable is date, which for these cells is always 2000101
but after a couple of hundreds observations it changes to 2000102
, denoting a change in day.
The second variable source_01
is a country attacking another one. In the example, IND
is India, AFG
is Afghanistan and the other codes are other countries.
The third variable target_01
is just the victim of the conflict.
Finally, cameocode_01
is a level of intensity of conflict measured with some algorithm that tracks the news in each language.
What I am after is to create a new (per country) variable that extracts the cameo code of that event if a specific country is involved either as source or target.
For this specific example, below is my desired output for the case of India (code IND
), which is involved in two events on the specific date:
date INDIAcameo
20000101 "042"
20000101 "043"
I have tried this:
replace INDIA cameo=cameocode if "target" ~ "source" ==IND
However, it says type mismatch and I doubt it would give me what I look for anyway.