Home: Part 2: ERAAnalyze

This part of the document is where we actually get to the nitty-gritty of the ERA agroforestry data and therefore it requires us to load a number of R packages for both general Explorortive Data Analysis.

The total ERA dataset

We will first need to “split” the ERA data using the inherent ERAg function called “ERAComboSplit.” Using the ERAComboSplit function we can split practice and product combinations into duplicate individual rows each contain a unique combination of any practice x product combination present in the original observation.

Lets explore the proportions of data in ERA for each Practice under each Theme, based on the number of studies. Explorer the Treemap interactively!

Visualising each ERA theme and practice using a tree map gives a good understanding of the proportions of data under each theme and for each practice in the ERA data - based on the number of studies. We see agroforestry, our focus only accounts for a limited amount of the total ERA data.

Agroforestry data within ERA

Let us now focus on the agroforestry data within ERA by selecting data from ERA.Compiled that are only found under the ERA Theme “Agroforestry.”

The total agroforestry data is fairly large and contains 9871 observations from 270 studies with a total of 142 columns. How much of the total ERA data is under the Theme “Agroforestry?” In order to answer this we are going to divide the number of observations from the agroforestry data with the total number of ERA observations -and take the percentage.

Index <int>	Code <chr>	Author <chr>	Date <int>	Journal <chr>	DOI <chr>
2580	NN0045	Kho RM	2001	AGROFOREST SYST	10.1023/a:;1011820412140
2581	NN0045	Kho RM	2001	AGROFOREST SYST	10.1023/a:;1011820412140
2582	NN0045	Kho RM	2001	AGROFOREST SYST	10.1023/a:;1011820412140
2583	NN0045	Kho RM	2001	AGROFOREST SYST	10.1023/a:;1011820412140
2584	NN0045	Kho RM	2001	AGROFOREST SYST	10.1023/a:;1011820412140
2589	NN0045	Kho RM	2001	AGROFOREST SYST	10.1023/a:;1011820412140
2590	NN0045	Kho RM	2001	AGROFOREST SYST	10.1023/a:;1011820412140
2769	NN0048	Lamers JPA	1995	TROPENLANDWIRT
2770	NN0048	Lamers JPA	1995	TROPENLANDWIRT
2772	NN0048	Lamers JPA	1995	TROPENLANDWIRT

The ERA database currently has about 9 % of observations that falls under the Theme “Agroforestry.” This is fairly little as Zomer et al. (2016) mapped the extent of trees on farms on the continent using satellite imagery and geo-datasets and found that nearly 30 % of the agricultural land in Sub-Saharan Africa had at least 10 % tree cover (in both 2000 and 2010), with nearly 40% of the population living in agricultural lands are based in areas that some way is characterised by agroforestry. In addition, remote sensing data shows that on a global level, in 2010, 43% of all agricultural lands had at least 10% tree cover and that the tree cover has increased by 2% over the previous ten years. So the question is: Does the proportion of “agroforestry” data reflect the reality of farmers and agroecosystems in Africa - or should there in reality be more research on agroforestry in Africa?

Tree Cover (%)	2000				2010
	km2	% of Total Agricultural Land	Population (Millions)	% of Persons Who Live in Agricultural Areas	km2	% of Total Agricultural Land
>10	1,089,278	27.5	67.6	37	1,137,864	28.7
>20	528,602	13.3	28.2	16	582,064	14.7
>30	345,302	8.7	13.0	7	353,961	8.9

Splitting practice and product combinations

We have to split the practice and product combinations of the agroforestry data to visualise the proportions of the individual practices, outcomes and crops within the agroforestry data.

There are 286 sub-practice combinations in the ERA agroforestry data. Lets have a look at these practices.

x
Parklands
Inputs Urea-Parklands
Inputs P-Parklands
Inputs P-Inputs Urea-Parklands
Windbreak
Mulch (nonNfix)-Windbreak
AgFor Prune Mulch (Nfix)-Inputs Kraaling
AgFor Prune Mulch (Nfix)
AgFor Prune Mulch (nonNfix)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)-Hedge
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (nonNfix)-Hedge
AgFor Alley (Nfix)-AgFor Alley (nonNfix)
AgFor Prune (Nfix)
AgFor Prune (Nfix)-Inputs N
AgFor Alley (Nfix)-AgFor Prune (Unknown)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgrFor Prune Incorp (Nfix)
AgrFor Prune Incorp (Nfix)
AgFor Alley (Nfix)
AgFor Prune Incorp (nonNfix)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgrFor Prune Incorp (Nfix)-Inputs N
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgrFor Prune Incorp (Nfix)-Inputs P
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgrFor Prune Incorp (Nfix)-Inputs Micro-Inputs N
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgrFor Prune Incorp (Nfix)-Inputs Micro-Inputs N-Inputs P
AgFor Alley (Nfix)-AgFor Prune (Nfix)-AgFor Prune (Unknown)
AgFor Prune Incorp (nonNfix)-Inputs K-Inputs P
AgFor Prune Incorp (nonNfix)-Inputs K-Inputs N-Inputs P
AgFor Alley (nonNfix)-AgFor Prune (Unknown)
AgFor Alley (nonNfix)
Silvopastoral
AgrFor Prune Incorp (Nfix)-Inputs Micro-Inputs N
AgFor Prune (Unknown)-Parklands
AgFor Prune (Unknown)
AgFor Prune Mulch (Nfix)-Inputs N-Inputs P
AgFor Prune Mulch (Nfix)-Inputs K-Inputs N-Inputs P
AgFor Alley (nonNfix)-Inputs K-Inputs N-Inputs P-Inputs Urea
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)
AgFor Alley (Nfix)-Inputs Urea
AgFor Alley (Nfix)-NoTill
AgFor Prune Mulch (Nfix)-Inputs N
AgFor Prune Mulch (Nfix)-Inputs N-Inputs Urea
AgrFor Prune Incorp (Nfix)-Inputs N-pH
AgFor Prune Mulch (Nfix)-Inputs N-pH
AgFor Alley (Nfix)-AgFor Prune Mulch (Nfix)
AgFor Prune Mulch (Nfix)-Inputs Urea
Hedge
Grass Strips-Hedge
AgFor Alley (nonNfix)-AgFor Prune (noID)-AgFor Prune (Unknown)
AgFor Alley (Nfix)-AgFor Prune (noID)-AgFor Prune (Unknown)
AgFor Alley (Nfix)-AgFor Alley (nonNfix)-AgFor Prune (noID)-AgFor Prune (Unknown)
AgFor Prune Incorp (nonNfix)-Residue Incorp (nonNfix)
AgFor Prune (Unknown)-AgFor Prune Incorp (nonNfix)
AgFor Prune (Unknown)-AgFor Prune Incorp (nonNfix)-Inputs N
AgFor Prune Incorp (nonNfix)-Inputs P
AgFor Prune (Unknown)-AgrFor Prune (nonNfix)-Residue Incorp (Nfix)
AgFor Prune (Unknown)-AgrFor Prune (nonNfix)-Inputs P-Residue Incorp (Nfix)
AgFor Prune Incorp (nonNfix)-Inputs N
AgrFor Prune Incorp (Nfix)-Inputs N
AgrFor Prune (nonNfix)-AgrFor Prune Incorp (Nfix)
AgFor Prune (Nfix)-AgrFor Prune Incorp (Nfix)
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (nonNfix)
AgrFor Prune Incorp (Nfix)-Inputs P
AgFor Prune Incorp (nonNfix)-Inputs N-Inputs P
AgrFor Prune Incorp (Nfix)-Inputs N-Inputs P
AgFor Prune Incorp (nonNfix)-Inputs K-Inputs P-Inputs Urea
AgFor Prune Incorp (nonNfix)-Inputs Manure
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgrFor Prune Incorp (Nfix)-Inputs Manure
AgrFor Prune Incorp (Nfix)-Inputs N-Inputs P-Inputs Urea
AgFor Prune Incorp (nonNfix)-Inputs N-Inputs P-Inputs Urea
AgrFor Prune Incorp (Nfix)-Inputs Micro
AgFor Prune (Unknown)-AgrFor Prune Incorp (Nfix)-Inputs K-Inputs N-Inputs P
AgFor Prune (Unknown)-AgrFor Prune Incorp (Nfix)
AgFor Prune Mulch (nonNfix)-Inputs P
AgFor Alley (Nfix)-AgFor Prune Mulch (Nfix)-Inputs Micro-Inputs N
AgFor Prune Mulch (Nfix)-Inputs Micro-Inputs N
AgFor Alley (Nfix)-AgFor Prune Mulch (Nfix)-Inputs N
AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-Inputs Micro-Inputs N
AgFor Alley (Nfix)-AgFor Alley (nonNfix)-AgFor Prune (Unknown)
AgFor Alley (Nfix)-AgFor Alley (nonNfix)-AgFor Prune (Unknown)-Inputs Micro-Inputs N
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-Inputs Micro-Inputs N
AgFor Alley (nonNfix)-Inputs Micro-Inputs N
AgFor Alley (Nfix)-Inputs Micro-Inputs N
AgrFor Prune Incorp (Nfix)-Inputs K-Inputs P-Inputs Urea
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgrFor Prune Incorp (Nfix)-Ridge & Furrow-Seed Improv
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgrFor Prune Incorp (Nfix)-Seed Improv
AgrFor Prune Incorp (Nfix)-Inputs Micro-Inputs N-Inputs P
AgFor Prune (Unknown)-Inputs Micro-Inputs N
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)-Inputs P
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)-Inputs N
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)-Inputs N-Inputs P
AgFor Alley (Nfix)-Seed Improv
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)-Inputs K-Inputs N-Inputs P
AgFor Alley (Nfix)-AgrFor Prune Incorp (Nfix)-Inputs K
AgFor Alley (Nfix)-AgrFor Prune Incorp (Nfix)-Inputs K-Inputs P
AgFor Alley (Nfix)-AgrFor Prune Incorp (Nfix)
AgFor Alley (Nfix)-AgrFor Prune Incorp (Nfix)-Inputs Micro-Inputs N
AgFor Prune (Nfix)-AgFor Prune (Unknown)-Inputs N
AgFor Prune (Nfix)-AgFor Prune (Unknown)
AgFor Prune Mulch (noID)
AgFor Prune Incorp (nonNfix)-AgrFor Prune Incorp (Nfix)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)-NoTill
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (nonNfix)-Inputs K-Inputs N-Inputs P-Inputs Urea
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)-Inputs K-Inputs N-Inputs P-Inputs Urea
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgrFor Prune Incorp (Nfix)-Inputs K-Inputs N-Inputs P
AgFor Prune Mulch (Nfix)-AgFor Prune Mulch (nonNfix)-Inputs Manure
AgFor Prune Incorp (nonNfix)-AgrFor Prune Incorp (Nfix)-Inputs Manure-Ridge & Furrow
AgFor Prune Mulch (Nfix)-AgFor Prune Mulch (nonNfix)-Inputs Manure-Inputs Urea
AgFor Prune Incorp (nonNfix)-AgFor Prune Mulch (Nfix)-AgFor Prune Mulch (nonNfix)-AgrFor Prune Incorp (Nfix)-Inputs Urea-Ridge & Furrow
AgFor Prune Incorp (nonNfix)-AgrFor Prune Incorp (Nfix)-Inputs Manure-Inputs Urea-Ridge & Furrow
AgFor Prune Mulch (Nfix)-AgFor Prune Mulch (nonNfix)
AgFor Prune Incorp (nonNfix)-AgFor Prune Mulch (Nfix)-AgFor Prune Mulch (nonNfix)-AgrFor Prune Incorp (Nfix)-Ridge & Furrow
AgFor Prune Incorp (nonNfix)-AgrFor Prune Incorp (Nfix)-Inputs Urea-Ridge & Furrow
AgFor Prune Incorp (nonNfix)-AgrFor Prune Incorp (Nfix)-Ridge & Furrow
AgFor Prune Mulch (Nfix)-AgFor Prune Mulch (nonNfix)-Inputs Urea
AgFor Prune Mulch (nonNfix)-Parklands
AgFor Prune Mulch (Nfix)-AgFor Prune Mulch (nonNfix)-Mulch (nonNfix)
AgFor Prune Mulch (nonNfix)-Mulch (nonNfix)
AgFor Prune (Unknown)-AgFor Prune Mulch (nonNfix)-Parklands
AgFor Prune (Unknown)-AgFor Prune Mulch (nonNfix)
AgFor Prune Mulch (nonNfix)-Inputs Compost
AgFor Prune Mulch (nonNfix)-Inputs K-Inputs N-Inputs P-Inputs Urea
AgFor Alley (nonNfix)-Inputs K-Inputs N-Inputs Urea
AgFor Alley (noID)
AgFor Prune Incorp (nonNfix)-Inputs Compost
AgFor Alley (Nfix)-Inputs N-Seed Improv
AgFor Alley (Nfix)-Inputs K-Inputs N-Inputs P
AgFor Alley (Nfix)-Inputs N
AgFor Alley (Nfix)-Inputs N-Inputs P
Agrosilvopastoral
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgrFor Prune Incorp (Nfix)-Inputs Urea
AgFor Prune (Unknown)-Inputs Urea
AgFor Multistrata
AgFor Alley (Nfix)-Inputs K-Inputs N
AgrFor Prune (nonNfix)-Inputs Biosolids-Inputs Compost-Inputs Manure
AgrFor Prune (nonNfix)-Inputs Biosolids
AgrFor Prune (nonNfix)
AgrFor Prune (nonNfix)-Inputs Manure
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-AgrFor Prune (nonNfix)-Inputs K-Inputs Urea-Mulch (nonNfix)
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-AgrFor Prune (nonNfix)-Mulch (nonNfix)
AgFor Prune (Unknown)-Inputs K-Inputs N-Inputs P
Scattered Trees
AgFor Prune Mulch (nonNfix)-Inputs K-Inputs N-Inputs P
AgrFor Prune (nonNfix)-Inputs Urea
AgFor Prune (Nfix)-Inputs Urea
AgFor Prune (Unknown)-AgFor Prune Mulch (nonNfix)-Hedge-Inputs N-NoTill
AgFor Prune (Unknown)-AgFor Prune Mulch (nonNfix)-Hedge-NoTill
AgFor Prune (Unknown)-AgFor Prune Mulch (nonNfix)-Hedge
AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)-Hedge-Inputs N-NoTill
AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)-Hedge-NoTill
AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)-Hedge
AgFor Prune (Unknown)-AgFor Prune Mulch (nonNfix)-Hedge-Inputs N
AgFor Prune (Unknown)-AgFor Prune Mulch (Nfix)-Hedge-Inputs N
AgFor Fallow (Nfix)-AgFor Prune (noID)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)
AgFor Prune (noID)
AgFor Prune (noID)-Inputs N
AgFor Prune (noID)-AgFor Prune (Unknown)
Hedge-Mulch (noID)
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-NoTill
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-NoTill
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)
AgFor Alley (Nfix)-AgFor Fallow (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)
AgFor Alley (Nfix)-AgFor Fallow (Nfix)-AgFor Prune Mulch (noID)-Inputs K-Inputs N
AgFor Alley (Nfix)-AgFor Fallow (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Inputs K-Inputs N
AgFor Prune (Unknown)-Inputs K-Inputs N
AgFor Alley (Nfix)-AgFor Fallow (Nfix)-AgFor Prune Mulch (noID)
AgFor Fallow (Nfix)-Inputs K-Inputs N-Inputs P
AgFor Fallow (Nfix)-AgFor Prune (Unknown)-Inputs K-Inputs N-Inputs P
AgFor Alley (Nfix)-AgFor Fallow (Nfix)-Inputs K-Inputs N-Inputs P
AgFor Alley (Nfix)-AgFor Prune Mulch (noID)
AgFor Alley (Nfix)-AgFor Fallow (Nfix)-AgFor Prune (Unknown)-Inputs K-Inputs N-Inputs P
AgFor Prune Mulch (noID)-Mulch (noID)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-Mulch (noID)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Inputs K-Inputs N-Inputs P-Mulch (noID)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Mulch (noID)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-Inputs K-Inputs N-Inputs P
AgFor Prune Mulch (noID)-NoTill
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)
AgFor Alley (Nfix)-AgFor Alley (nonNfix)-AgFor Prune (noID)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Rotation (Mixed)
AgFor Alley (Nfix)-Rotation (Mixed)
AgFor Fallow (Nfix)-AgFor Prune Incorp (noID)
Grass Strips-Hedge-MinTill
AgFor Prune Incorp (noID)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Intercrop (Mixed)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Inputs N-Inputs P-Intercrop (Mixed)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-AgFor Prune Mulch (noID)-Intercrop (Mixed)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Intercrop (Mixed)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-Planting Basins
AgFor Alley (noID)-AgFor Prune (Unknown)
AgFor Alley (nonNfix)-MinTill-Mulch (noID)
AgFor Prune Incorp (noID)-Green Manure (Nfix; Space)
AgFor Prune Incorp (noID)-Green Manure (Nfix; Space)-Inputs K-Inputs N-Inputs P
AgFor Alley (nonNfix)-AgFor Prune (noID)-AgFor Prune (Unknown)-Rotation (Mixed)
AgFor Fallow (nonNfix)
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Inputs K-Inputs P-Inputs Urea
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Rotation (Mixed)
AgFor Alley (nonNfix)-Intercrop (Nfix)
AgFor Alley (nonNfix)-Rotation (Mixed)
AgFor Fallow (Nfix)
AgFor Fallow (Nfix)-Inputs K
AgFor Prune Incorp (noID)-Inputs P
AgFor Prune Mulch (noID)-Inputs P
AgFor Fallow (Nfix)-Ridge & Furrow
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Inputs K-Inputs P-Inputs Urea
AgFor Fallow (Nfix)-Intercrop (nonNfix)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Intercrop (Mixed)-Ridge & Furrow-Seed Improv
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Intercrop (Mixed)-Seed Improv
AgFor Prune Incorp (noID)-Green Manure (Nfix; Space)-Inputs N
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Inputs Manure
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Inputs Manure
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Inputs Urea
AgFor Fallow (Nfix)-Mulch (noID)-NoTill-Rotation (Mixed)
AgFor Fallow (Nfix)-Rotation (Mixed)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Intercrop (Mixed)-pH
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Inputs P
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Inputs Urea-Intercrop (Mixed)
AgFor Alley (Nfix)-AgFor Prune Incorp (noID)
AgFor Fallow (Nfix)-AgFor Prune Incorp (noID)-Inputs K-Inputs N-Inputs P
AgFor Fallow (Nfix)-AgFor Prune Incorp (noID)-NoTill
AgFor Fallow (Nfix)-AgFor Prune Incorp (noID)-Inputs K-Inputs N-Inputs P-NoTill
AgFor Fallow (Nfix)-AgFor Prune (noID)-NoTill
AgFor Fallow (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)
AgFor Fallow (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-NoTill
AgFor Fallow (Nfix)-NoTill
AgFor Fallow (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)
AgFor Fallow (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Imp Fallow (Nfix)-Residue Incorp (noID)
AgFor Prune (Unknown)-Hedge
AgFor Prune (Unknown)-Hedge-Inputs N-Inputs P
AgFor Prune (Unknown)-Inputs N-Inputs P
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Intercrop (nonNfix)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Inputs K-Inputs N-Inputs P-Intercrop (nonNfix)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Inputs K-Inputs N-Inputs P
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Intercrop (nonNfix)-Residue Incorp (noID)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Inputs K-Inputs N-Inputs P-Intercrop (nonNfix)-Residue Incorp (noID)
AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Hedge
AgFor Prune Mulch (noID)-Intercrop (nonNfix)
AgFor Fallow (Nfix)-AgFor Prune (Unknown)-Ridge & Furrow
AgFor Fallow (Nfix)-AgFor Prune (Unknown)-Inputs Manure-Ridge & Furrow
AgFor Prune (Unknown)-Inputs Manure
AgFor Fallow (Nfix)-AgFor Prune (Unknown)-Inputs K-Inputs N-Inputs P-Seed Improv
AgFor Fallow (Nfix)-Inputs K-Inputs N-Inputs P-Seed Improv
AgFor Prune (Unknown)-Intercrop (nonNfix)
AgFor Alley (Nfix)-Intercrop (nonNfix)
AgFor Alley (nonNfix)-Intercrop (nonNfix)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Inputs Urea-Rotation (Mixed)
AgFor Multistrata-Intercrop (Mixed)
AgFor Alley (nonNfix)-Intercrop (Mixed)
AgFor Multistrata-Inputs Manure-Intercrop (Mixed)
AgFor Alley (Nfix)-Intercrop (Mixed)-Seed Improv
AgFor Alley (Nfix)-Intercrop (Mixed)
AgFor Alley (Nfix)-Rotation (Mixed)-Seed Improv
AgFor Alley (Nfix)-Inputs K-Inputs N-Inputs Urea-Intercrop (Mixed)
AgFor Alley (Nfix)-Inputs K-Inputs N-Inputs Urea-Intercrop (Mixed)-Seed Improv
AgFor Alley (Nfix)-Inputs K-Inputs N-Inputs Urea-Rotation (Mixed)-Seed Improv
AgFor Alley (nonNfix)-AgFor Prune (noID)-AgFor Prune (Unknown)-Inputs K-Inputs Urea-Mulch (noID)
AgFor Alley (nonNfix)-AgFor Prune (noID)-AgFor Prune (Unknown)-Mulch (noID)
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Incorp (noID)-Inputs K-Inputs N-Inputs P-Inputs Urea
AgFor Alley (nonNfix)-Green Manure (nonNfix; Time)-Rotation (nonNfix)
AgFor Fallow (Nfix)-AgFor Fallow (nonNfix)
AgFor Alley (Nfix)-AgFor Fallow (Nfix)-AgFor Fallow (nonNfix)-AgFor Prune (noID)-AgFor Prune (Unknown)
AgFor Alley (Nfix)-AgFor Fallow (Nfix)-AgFor Fallow (nonNfix)
AgFor Alley (Nfix)-AgFor Fallow (Nfix)-AgFor Prune (Unknown)
AgFor Alley (nonNfix)-AgFor Fallow (nonNfix)-AgFor Prune (Unknown)
AgFor Alley (nonNfix)-AgFor Fallow (nonNfix)
AgFor Alley (nonNfix)-AgFor Fallow (nonNfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)
AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Hedge-Inputs N-NoTill
AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Hedge-NoTill
AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Hedge-Inputs N
AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Hedge-Inputs K-Inputs N-Inputs P
AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Hedge-Inputs K-Inputs P
AgFor Alley (Nfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Inputs K-Inputs N-Inputs P
AgFor Alley (nonNfix)-AgFor Prune (Unknown)-AgFor Prune Mulch (noID)-Inputs K-Inputs N-Inputs P
AgFor Prune Mulch (noID)-Imp Fallow (Nfix)-Intercrop (Mixed)
AgFor Prune Mulch (noID)-Imp Fallow (Nfix)-Inputs Manure-Intercrop (Mixed)
AgFor Prune Mulch (noID)-Imp Fallow (Nfix)-Inputs K-Inputs N-Inputs P-Inputs Urea-Intercrop (Mixed)
Inputs Compost-Scattered Trees
AgFor Other
AgFor Other-NoTill
AgFor Other-Mulch (nonNfix)-NoTill
AgFor Prune Mulch (Nfix)-Imp Fallow (Nfix)
AgFor Alley (Mixed)
AgFor Prune Mulch (noID)-Inputs K-Inputs P
AgFor Prune Mulch (noID)-Inputs K-Inputs N-Inputs P
AgFor Prune Mulch (nonNfix)-MinTill-Planting Basins
AgFor Prune Mulch (Nfix)-MinTill-Planting Basins
AgFor Prune Mulch (nonNfix)-NoTill
AgFor Prune Mulch (nonNfix)-Intercrop (Mixed)-NoTill
AgFor Prune Mulch (Nfix)-AgFor Prune Mulch (noID)-Inputs Urea
AgFor Prune Mulch (Nfix)-AgFor Prune Mulch (noID)
AgrFor Prune Incorp (Nfix)-Inputs Urea
AgFor Alley (nonNfix)-AgFor Prune Mulch (noID)
AgFor Prune Mulch (noID)-Intercrop (Mixed)-Mulch (nonNfix)
AgFor Prune Incorp (noID)-Scattered Trees
AgFor Prune Mulch (nonNfix)-Mulch (noID)
AgFor Prune (Unknown)-Inputs N
AgFor Alley (Nfix)-AgFor Prune Mulch (noID)-MinTill
AgFor Alley (Nfix)-AgFor Prune Mulch (noID)-NoTill
AgFor Prune Mulch (nonNfix)-Inputs P-Residue Unkn (nonNfix; NoFate)
AgFor Prune Mulch (nonNfix)-Inputs Manure-Residue Unkn (nonNfix; NoFate)
AgFor Prune (Nfix)-Green Manure (Nfix; Time)
AgFor Alley (Nfix)-Bunds-Trench
AgFor Prune Incorp (noID)-Parklands
AgFor Multistrata-Inputs K-Inputs N-Inputs P
AgFor Multistrata-Inputs K-Inputs Micro-Inputs P
AgFor Prune Mulch (noID)-Imp Fallow (Nfix)-Inputs K-Inputs P
AgFor Prune Mulch (noID)-MinTill-Mulch (noID)-NoTill

The ERAComboSplit function inherent in ERAg package, was specially developed for splitting the ERA sub-practices. By using it on our agroforestry data we can get to know the proportion different practices and sub-practices, products and outcomes within the agroforestry data and again use the tree map function to visualise it. This is possible because ERA data is grouped or nested in a hierarchical (or tree-based) structure, we illustrate the proportions based on the number of studies for each category and sub-category and we then thow this as represented within the agroforestry data (ERA Theme: “Agroforestry”).

Lets split the practice, sub-practice combinations from the agroforestry data using the ERAComboSplit function

Agroforestry data: practices and sub-practices

Explore the interactive tree map on the proportions of unique Sub-Practices nested within their respected Practices for the agroforestry data.

We see that Agroforestry Pruning and Agroforestry Pruning-Alleycropping is the two most represented agroforestry practices in the data. Together with the two second most represented practices, Alleycropping and Agroforestry Pruning-Inorganic Fertiliser, they account for about half of all the data. It shows us that most research is being conducted on these practices where other agroforestry practices, such as Parklands

Agroforestry data: sub-outcomes and practices

This treemap show the proportions of unique Sub-Outcomes nested within their respected Practices for the agroforestry data. As a lot of the data does not have specific sub-outcomes for the practice we can better view this in a static way.

Within the agroforestry data we find that the common agronomic terms crop -and biomass yields are the most represented outcomes. Hence a lot of data are pressent for these kind of outcomes. In contrast, there is little data on outcomes such as runoff or biodiversity. This indicates that any further analysis we wish to perform using machine learning techniques would benefit greatly from using data rich areas such as crop -and/or biomass yield outcomes.

Agroforestry data: Crops

This treemap show the proportions of different crops within the agroforestry data. Again its better to view this in a static way:

We find that the majority of crops are cereals and mostly annual crops. Maize is the most represented (the largest proportion) of all crops. This is in line with agricultural statistics from FAO, 2021 Maize is grown on over 40 M ha of land in Africa and is the primary cereal grown in over half of the countries in Africa, and one of the top two cereals in over three-quarters of these countries.

Note: Interesting to notice is that there is a great variety of crops. Many annuals, especially cereals but alo many crops we normally associate with being biannual or perennial, such as yam (

), banana (genus:

) and coffee (genus:

Agroforestry data: Geographic distribution

We can use the ERAHexPlot function, inherent in the ERAg package, to spatially project where in Africa this agroforestry data comes from. We are doing this by plotting the number of studies for each ERA location on a map of the continent.

Countries where the agroforestry data comes from

What countries is contributing most agroforestry observations. Lets view countries and their respected proportional contribution to the ERA agroforestry data:

We see that Nigeria, Zimbabwe and Kenya are the countries where most of the ERA agroforestry data is coming from.

ERA data on crop life span

The goal is to subset the agroforestry data in a way so that is only include annual crops.

Before we proceed we will have to first subset the ERA Agroforestry data to only include annual crops. Why is that? It is simply because it is not meaningful to perform any analysis on data that have significantly different crop life spans. Just imagine if we were to compare the effect of agroforestry on the yield of cacao (Theobroma cacao) with the effect of agroforestry on the yield of potato (Solanum tuberosum). The cacao have obviously a very different growth and development cycle compared to the potato and we cannot simply aggregate the two and perform the same kind of analysis.

We are going to subset the agroforestry data, so we only proceed with data that have annual crops. We can use the information on crops life span from the ERA_EcoCrop dataset, that is part of ERA’s ERAg package. First, we are going to identify the individual experimental units (EU) of the ERA.Compiled dataset through another inherent ERA dataset, called EUCodes. The EUCodes data contains information on the unique products withing ERA. We are then creating a new dataframe with both EU codes and scientific species names (ECOCROP.Name). Finally we will use this newly created dataset to merge with the observations in the ERA.Compiled dataset.

The ERA_EcoCrop dataset now contains a column with information on the scientific crop species and a column with the associated life span of that particular crop.

We can now transfer the Life.span information from the ERA_EcoCrop data to the EUCodes by merging the two datasets on their common row names for the scientific crop species names. Next, we are going to merge the data on crop life span with the associated EU codes so that we eventually have a dataset that we are able to merge with the ERA.Compiled dataset based on EU codes.

No we can finally merge our information on life span of a particular crop species with the ERA.Compiled dataset. The result is a added column to the ERA.Compiled data in which we have information on whether a particular crop is annual and/or biannual and/or perennial. We merge by EU codes.

During the use of left_join() function columns with similar names were automatically added a suffix of either “.x” or “.y” in order to make them unique. Before we can proceed to analyse the data we need to clean the ERA_Compiled_LifeSpan data by removing the redundant columns that were created.

Now we have our Compiled ERA data with clean information on whether the crop is annual or not. Lets see how the life span of crops in the ERA data is distributed.

Using a visualisation, such as a bar plots we can get a good idea of the proportions of crop life span classes in ERA’s data.

It is evident that we have a majority of “annual” crops in ERA. However, we also see that a considerable amount of the ERA data actually lack information on whether a crop is annual or not.

species <chr>	Life.span <chr>	n <int>
Abelmoschus manihot	annual, perennial	2
Abelmoschus esculentus	annual	1
Abelmoschus moschatus	annual, biennial, perennial	1
Abies amabilis	perennial	1
Abies balsamea	perennial	1
Abies concolor	perennial	1
Abies pindrow	perennial	1
Abroma augustum	perennial	1
Abutilon theophrasti	annual	1
Acacia abyssinica	perennial	1

Agroforestry data with only annual crops

Now that we have information on the life span of the crops in ERA we are going again select data from the ERA Theme “Agroforestry” but this time were also filtering the data on the newly added life span column. In this way we are sure that our data only include agroforestry and that this data only include annual crop species.

By filtering out all non-annual crops we get a reduction in observations of about 37 percent. Before we proceed, lets just check that we really only have agroforestry data with annual crops.

Filtering for crops = annual	Total number of observations in the dataset	Total number of column features in the dataset
Agroforestry data before filtering	9871	216
Agroforestry data after filtering	6745	216

The line “< table of extent 0 >” indicate that we indeed do not have any NA observations in the data. The next lines “Life.span: annual” indicate that we successfully created a data set of agroforestry data with only annual crops based on the Life.span columns.

Next, we are going to proceed with a proper analysis of the ERA agroforestry dataset using the ERAAnalyze function. This function is part of the ERAg package and serves to analyse the response ratios (RR)

It is recommended to apply some data cleaning or data preparation to the ERA dataset before feeding it into the ERAAnalyze function. Luckily there is another handy function for this crusial step in the arsenal of functions from the ERAg package. This function is funny enough called PrepareERA So before we can start using the ERAAnalyze, lets make some preparations using the PrepareERA function.

Using the ERAAnalyze function

The ERAAnalyze function is a (meta)-analysis function that analyses outcome ratios in the ERA dataset for each combination of grouping variables as specified by column names in the Aggregate.By parameter. It is recommended to apply the ERA.Prepare function to the data before using ERAAnalyze. In detail the ERAAnalyze function performs the following to the data:

Define focus practices

What practices do we find in the ERA agroforestry data? Wee need all these practices for our ERAAnalyze function, as we are interested in the most aggregated form of agroforestry data, with all the practices!

We define the focus practices as all nine ERA agroforestry practices and then we are going to view the number of observations for each agroforestry practice.

We see that the vast majority of observations are for the practices “Agroforestry Pruning,” “Agroforestry Pruning-Alleycropping” and “Agroforestry Pruning-Inorganic Fertilizer”

The definitions of ERA practices can be found using the function ERAg::PracticeCodes(). Here we apply the function to look at the definitions of “FMNR,” “Alleycropping” and “Multistrata Agroforestry.” The reason why we have four definitions for Alleycropping is because there are four different Sub-Practices within Alleycropping.

PrName <chr>
Agroforestry Pruning
Agroforestry Pruning-Alleycropping
Agroforestry Pruning-Inorganic Fertilizer
Alleycropping
Agroforestry Pruning-Alleycropping-Inorganic Fertilizer
Agroforestry Fallow-Agroforestry Pruning
Parklands
Agroforestry Pruning-Reduced Tillage-Water Harvesting
Agroforestry Fallow
Agroforestry Pruning-Boundary Planting

Practice	Definition(s)
FMNR	“Systematic regrowth of trees or shrubs on agricultural, forest or pasture land. FMNR is used in areas where there are stumps that can coppice or seeds that can germinate. Sometimes called farmer assisted regeneration.”
Alleycropping	“Intercropping with rows or alleys of nitrogen fixing trees or woody shrubs.”
	“Intercropping with rows or alleys of trees or woody shrubs that do not fix nitrogen.”
	“Intercropping with rows or alleys of a mixture of trees or woody shrubs of which some but not all fix nitrogen.”
	“Intercropping with rows or alleys of trees or woody shrubs where no information about the type of plant is given.”
Multistrata Agroforestry	“Multistorey systems agroforestry systems have several spatial strata occupied by different tree (i.e. woody) crops (coffee, tea, cacao, banana etc).”

Threshold: Minimum number of observations

Clicking the arrow to the right in the interactive table above shows us that some practices have very few numbers of observations and this can negatively impact the power and robustness of our analysis with ERAAnalyze. So let’s subset the dataset with data that have a minimum of 25 observations per practice (PrName). This is a rather arbitrary number that depends a lot on what one wish to analyse and how much data is available at first. Here we will set the threshold as a minimum number of observations to be 25.

The threshold reduces the total dataset available for the ERAAnalyze from 6745 to 6361 observations. We did loose a bit of data but hopefully our later analysis will be more powerful and robust. This is a fine-tuning trade-off issue and one has to decide on such a threshold based on the goal of the analysis, the available data to answer the relevant questions and expert knowledge.

Threshold	Total number of observations in the dataset	Total number of column features in the dataset
Before PrName obs >= 25 threshold	6745	217
After PrName obs >= 25 threshold	6361	217

Note: Later we will include a second threshold to our data so that the number of studies should be of a minimum of 2 (see section Calculating response ratios with ERAAnalyze

CREATING THE NICE RIDGE LINE PLOTS OF EACH PRACTICE

Using the PrepareERA function

As mentioned above, there are some important pre-processing steps that we need to apply to the data before using the ERAAnalyze function. Luckily the ERAg::PrepareERA function can help us perform these pre-processing steps to the data. These steps include dealing with negative response ratio outcomes and dealing with inverse outcomes, like reversing MeanC (control) and MeanT (treatment) if it happens that MeanC is better than MeanT. The steps performed in ERAg::PrepareERA is:

Alright, Now that we know what the PrepareERA function is doing. Below we are applying the PrepareERA functions to the ERA agroforestry data.

We have lost some column features from the dataset as expected when PrepareERA is applied. We have also lost a few observations, indicating that there were negative outcomes, that would have caused us an issue in the ERAAnalyze function.

	Total number of observations in the dataset	Total number of column features in the dataset
Before PrepareERA	6361	217
After PrepareERA	6277	66

Calculating RR with ERAAnalyze

Now we can perform the analysis to calculate response ratios (RR) of the outcome sub-indicator and practices using the ERAAnalyze function. This will tell us what combination of agroforestry practice and outcome yields the best response ratio. Interesting right? We can ultimately use this information to for example answer a question like: What agroforestry practices are significantly better at increasing Soil Carbon compared to their non-agroforestry counterparts (monoculture)?

Lets have a look at the various columns that come out of the ERAAnalyze function:

Note: For a detailed description of the ERAAnalyze output type ?ERAg::ERAAnalyze in R - if you have ERAg package installed.

Because we have relatively sparse number of observation for some combinations of our grouping variables we can use the data availability fields to filter the results based on a minimum number of studies. It is always good to have more that one study to contribute to the RR. The number of studies one wish to set as threshold depends on the specific analysis and the power and robustness one wish to obtain. Here we are going to set the combinations that meet a minimum data requirement based on number of studies.

Threshold: Minimum number of studies

Luckily we are working with a highly aggregated dataset with fairly large amounts of observations from plenty of studies. So we can specify a threshold of a minimum of 2 studies.

	Total number of Sub-Outcome and Practice combinations	Total number of column features
Before Studies >= 2	166	40
After Studies >= 2	74	40

Note: Again, This is a rather arbitrary number that depends on what exactly one wish to analyse and how much data is available after the ERAAnalyze.

Yes we did lose some of our data but in this way we are having a relatively more robust analysis outcomes. Hence we can make more generalisable conclusions, and this is especially important for this case - as we wish to look at the general impact of agroforestry accros crop and biomass yields for annual crops.

The potential and actual threshold of number of studies is very limited by the total amount of data available. In the case of very dis-aggregated data (e.g. Effect of the Practice “Agroforestry Pruning under Reduced Tillage with Water Harvesting” on the Outcome “Labour Hours per Person”) there are typically very little data available, possibly only from a few number of studies, hence we have no choice but to accept RR derived from relatively poor quality data.

RR of agroforestry practices

Note: A response ratio (RR) is simply the natural log of the ratio of the experimental outcome to the control outcome. If maize yields with planting basins are 1.5 Mg/ha and without them are 1.1 Mg/ha the response ratio is log(1.5/1.1) = 0.310. RRs greater than zero indicate the experimental treatment is better than the control and vice-versa for RRs less than zero.

Let’s now have a look at the response ratios of the agroforestry data. While remembering the most important information generated from the ERAAnalyze function. The highlighted are the most important to remember as these are what we are going to use to evaluate the response ratio performance of agroforestry practices.

Note: For easier interpretation mean RR and RR +/- standard errors are back-transformed and converted into percentage change (e.g., a ratio of 1.1 = a 10% increase). The RR.pc. columns back-transform log ratios using the exponent. The RR.pc.jen. columns back-transform log ratios using the exponent with a correction for the Jensen’s inequality Tanadini and Mehrabi (2017). We are going to use these back-transformed and corrected response ratios in percent units from RR.pc.jen, as these are easier to interpret.

ABCDEFGHIJ0123456789

Out.SubInd <chr>
Soil Organic Carbon
Crop Yield
Biomass Yield
Crop Yield
Crop Yield
Crop Yield
Soil Organic Carbon
Crop Yield
Crop Yield
Soil Organic Matter

Back to the agroforestry data: Density distributions

Before we get into the results of the ERAAnalyze function, let us first check the density distribution for the most data rich combination of Outcome and Practice to get a better idea of the data. This is the Outcome crop and biomass yield and the Practice “Agroforestry Pruning.” Hence, we are going to look at the density distribution of the RR of the Crop Yield variable.

We can make a similar plot for the proportions of MeanC/MeanT with the agroforestry data of Outcome crop and biomass yield and the Practice “Agroforestry Pruning”:

This gives us a clear indication of why the log-transformation is important to our RR outcome. If not we would get this long tail on the distribution.

Results of ERAAnalyze

Great! Now we can interpret the results coming from ERAAnalyze and ask ourself some pretty interesting questions. For example, does practising agroforestry result in higher crop yields compared to non-agroforestry? Or can we identify any significant positive contribution on soil organic carbon when agroforestry is practised?

What is the proportional effect of different agroforestry practices on response ratios (RR)? Or said in another way. Which agroforestry practices are performing best and which are performing worse?

To visually summarise results that can answer this question a simple bar plot is great! If we want to make a simple bar plot that shows the proportional impact on RR for the different agroforestry practices the best thing we can do is to use the back-transformed RR in percentage, that have been corrected for Jensen’s inequality - found in the column “RR.pc.jen.”

However, before we proceed to actually making the bar plot of proportional effects on RR, we have to deal with two important issues:

Omitting outliers from the analysed data

Identifiyng outliers

We will use a combined outlier detection method as suggested by Evgeni Chasnovski on his website, Question Flow. We are going to perform the combined outlier detection method by applying functions from the packages “dplyr” and “ruler,” to the analysed agroforestry data. This method is based on combining different outlier detection techniques to identify rows which are “strong outliers” and which might by considered outliers based on several methods. The combined outlier detection method can be divided in steps of:

The resulting function (isnt_out_funs) of the combined methods has outputs for three methods (Z-score, Z-score with MAD and Tukey’s fences). Their names are considered as method names.

Note: that all listed approached depend on the choice of the univariate outlier detection method. We will use all three previously listed univariate techniques.

For the “agroforestry.analyzed” dataset rules for column based non-outlier rows can be defined based on 7 numeric columns and 3 presented univariate detection methods. There is a convenient way of computing all them at once using scoped variant of dplyr::transmute():

ABCDEFGHIJ0123456789

Observations_z <lgl>	Studies_z <lgl>	Sites_z <lgl>	RR.Shapiro.Sig_z <lgl>	RR_z <lgl>	RR.median_z <lgl>
TRUE	TRUE	TRUE	TRUE	TRUE	TRUE
TRUE	TRUE	TRUE	TRUE	TRUE	TRUE
TRUE	TRUE	TRUE	TRUE	TRUE	TRUE
TRUE	TRUE	TRUE	TRUE	TRUE	TRUE
TRUE	TRUE	TRUE	TRUE	TRUE	TRUE
TRUE	TRUE	TRUE	TRUE	TRUE	TRUE
TRUE	FALSE	FALSE	TRUE	TRUE	TRUE
TRUE	TRUE	TRUE	TRUE	TRUE	TRUE
FALSE	FALSE	FALSE	TRUE	TRUE	TRUE
FALSE	FALSE	FALSE	TRUE	TRUE	TRUE

The output of the code above gives us something like a logical Matrix with “TRUE”/“FALSE” statements for each of the column features for unique groups of Practices (PrName) and Outcomes (Out.SubInd). This result has outputs for all the methods applied to the 166 groups. Their names are of the form _, separated by an “underscore” sign. So the name “RR.pc.jen_z” is interpreted as result of method “z” (Z-score) for summary function equal to mean value of the “RR.pc.jen” column. Column group defines names of the groupings.

To define non-outlier rows based on Mahalanobis distance one can apply univariate method for distances computed for some subset of numeric columns. To simplify a little bit, we will choose one “subset” with all numeric columns and all listed methods.

Definition of non-outlier rows based on groupings depends on a group summary function and univariate outlier detection methods. As grouping column we will choose the two non-numeric columns Practices (PrName) and Outcomes (Out.SubInd), and unite them into one called “PrName_Out.SubInd” - this will make it easier for our later imputation of non-outlier rows.

ABCDEFGHIJ0123456789

PrName_Out.SubInd <chr>
Agroforestry Fallow_Biomass Yield
Agroforestry Fallow_Crop Yield
Agroforestry Fallow_Infiltration Rate
Agroforestry Fallow_Soil Nitrogen
Agroforestry Fallow_Water Use Efficiency
Agroforestry Fallow-Agroforestry Pruning_Biomass Yield
Agroforestry Fallow-Agroforestry Pruning_Crop Yield
Agroforestry Fallow-Agroforestry Pruning_Infiltration Rate
Agroforestry Fallow-Agroforestry Pruning_Soil Nitrogen
Agroforestry Fallow-Agroforestry Pruning_Soil Organic Carbon

The output of the code above gives us something like a logical Matrix with “TRUE”/“FALSE” statements for each of the column features for unique groups of Practices (PrName) and Outcomes (Out.SubInd). This result has outputs for all the methods applied to the 166 groups. Their names are of the form _, separated by an “underscore” sigen. So the name “RR.pc.jen_z” is interpreted as result of method “z” (Z-score) for summary function equal to mean value of the “RR.pc.jen” column. Column group defines names of the groupings.

Exposure Column and Mahalanobis based definition of non-outlier rows can be expressed with row packs and group based - as group packs. This is syntax from the ruler package

Application of all those packs is called exposing process. The result is an exposure from which we can extract tidy data validation report using get_report.

Pack group defines group pack and is represented in breaker_report with id 0. To obtain row outliers based on grouping we need to expand those rows with information about rows in the data that belong to those groups. This can be done using dplyr::left_join() function:

ABCDEFGHIJ0123456789

pack <chr>	rule <chr>	id <int>
column	Observations_z	9
column	Observations_z	10
column	Observations_z	16
column	Studies_z	7
column	Studies_z	9
column	Studies_z	10
column	Studies_z	16
column	Sites_z	7
column	Sites_z	9
column	Sites_z	10

The output of the code above gives us a tibble called “outliers” which contains data about outlier rows.

Given the dataframe “outliers,” one can do whatever he/she wants to identify outliers. Here we will use the basic combination approach based on average scores. You can see in the table above, where a random sample of 10 rows is shown, that all have an “outlier score.” We can use this score to further process the data so that we remove outliers that are above a determined combined outlier score. Combined outlier detection score for certain row can be defined as share of applied methods that tagged it as outlier. Alternatively one can define it just as number of those methods as it will only change absolute value of the result and not the order.

Next we will use the combined outlier detection score to remove observations from our analysed agroforestry dataframe (agroforestry.analyzed), by setting a threshold to 0.3 and all observations above 0.3 will be removed.

Note: that the threshold here is again arbitrary and depends on the nature of the data, and the total availability of data. In our case we have 166 rows, from each unique combination of Outcome (Out.SubInd) and Practice (PrName). Hence we do not want to set the threshold too low as that would remove too many observations.

Visualising and assessing outliers

We can get a large variety of information out of this outlier analysis. Here we are going to look at the outliers for the “original” RR and percentage corrected RR.pc.jen as illustrated by: 1) Practices, 2) Outcomes and 3) Number of studies.

We can also look at the individual agroforestry practices and or outcomes (and combinations) to see witch ones have most outliers. Let us now look at what groups of our analysed agroforestry data is most prone to have outliers. We will use the dataset with breaker reports, that we prepared earlier.

ABCDEFGHIJ0123456789

var <chr>
Agroforestry Pruning_Labour Person Hours
Agroforestry Pruning-Inorganic Fertilizer_Beneficial Organisms
Agroforestry Pruning_Biodiversity
Agroforestry Pruning_Variable Cost
Agroforestry Pruning_Beneficial Organisms
Agroforestry Pruning_Erosion
Agroforestry Pruning_Return to Labour
Agroforestry Pruning-Alleycropping_Runoff
Agroforestry Pruning-Organic Fertilizer_Cation Exchange Capacity
Agroforestry Pruning-Inorganic Fertilizer_Variable Cost

As it could be expected, “Agroforestry Pruning” is among majority of top breaker groups, meaning that this group of agroforestry practices tend to have most outliers. This is expected since most studies, and most data is from this agroforestry practice group. Feel free to go through the groups yourself.

Using only basic outlier detection methods one can achieve insightful results by combining them. Observations which are tagged as outlier by more than some threshold number of methods might be named as “strong outliers.” Those should be considered as outliers based on the whole data rather then on separate features.

Next we are going to exclude outcome-practice combinations with extreme outliers.

Omitting outliers

We are going to exclude the outcome-practice combinations with extreme outliers using the extreme outlier removal method explained earlier, see section Using the ERAAnalyze function. With this approach all observations that are found beyond 3 * IQR, will be removed. Let us first have a look at the analysed agroforestry data that have been sorted based on RR.pc.jen.

PrName <chr>	Out.SubInd <chr>	RR <dbl>	RR.se <dbl>	RR.pc.jen <dbl>	RR.pc.jen.low <dbl>
Agroforestry Pruning	Erosion	0.64256	0.55725	Inf	NA
Agroforestry Pruning-Inorganic Fertilizer	Beneficial Organisms	3.03156	0.37196	2199.61084	1485.30990
Agroforestry Pruning	Labour Person Hours	1.79849	0.46449	1960.34916	NA
Agroforestry Pruning-Alleycropping	Runoff	1.87892	0.42875	585.42918	346.43561
Agroforestry Pruning	Beneficial Organisms	1.17269	0.43090	326.82333	177.40268
Agroforestry Pruning	Biodiversity	1.16253	0.17412	316.86641	NA
Parklands	Net Present Value	1.36345	0.46081	312.28101	160.05553
Agroforestry Fallow-Water Harvesting	Crop Yield	1.35338	0.15934	304.63288	245.03304
Alleycropping	Soil NO3	0.54931	0.38842	301.61756	NA
Agroforestry Fallow-Water Harvesting	Biomass Yield	1.32393	0.23377	296.92613	214.18443

Again we see that indeed it is the practice “Agroforestry Pruning” that has the most extreme outliers! However, outliers are also found for many other agroforestry practices. Next, we are going to remove all these cases of outliers using an extreme outliers removal method where values above or below

(interquartile range) are removed.

Open the code chunk above if you wish to see the code for removing the extreme outliers in the colums RR and RR.pc.jen.

We see that we have effectively removed outliers amounting to six observations. We also find that we have around 200 missing values in the observations. Next we are going to perform the median imputation on RR and RR.pc.jen values with missing values. We do this to be able to produce our final bar plot of proportional effect on RR and RR.ps.jen for the different agroforestry practices.

Impute missing values in the analysed data

Mean imputation of missing RR and RR.pc.jen values in the analysed agroforestry data

We will now use a median imputation technique for each of the agroforestry groups, where the median is found across groups of outcome and practice.

Note: the median, like any imputation technique, is statistically a caution and bold move to make and the visual outcome of this comparison should be taken with strong precautions that will limit potential conclusions.

Now that we “got rid of the NA values,” we can plot our RR and our RR.pc.jen for each of the different ERA agroforestry practices.

Variations in RR for agroforestry practices

Variations in RR.pc.jen for agroforestry practices

Plotting RR percentage with Jensen corrections for all agroforestry practices plot in bar plot

Variations in RR for agroforestry practices accros Crop and Biomass Yield outcomes

It is not always meaningful to look across all outcomes. Lets look at only for outcomes of crop yield and biomass yield

We do this by selecting data from the ‘agroforestry.analyzed.imputed’ that only has Biomass and Crop Yield outcomes

Plotting RR.pc.jen for all agroforestry practices with Biomass and Crop Yield outcomes

Variations in RR for agroforestry practices accros other outcomes

We do this by selecting data from the ‘agroforestry.analyzed.imputed’ that only has Soil Nitrogen and Total Soil Nitrogen outcomes

Plotting RR for all agroforestry practices with Soil Nitrogen and Total Soil Nitrogen outcomes

Plotting RR.pc.jen for all agroforestry practices with Soil Nitrogen and Total Soil Nitrogen outcomes

We do this by selecting data from the ‘agroforestry.analyzed.imputed’ that only has Water Use Efficiency outcomes

Plotting RR.pc.jen for all agroforestry practices with Water Use Effeciency outcomes

We do this by selecting data from the ‘agroforestry.analyzed.imputed’ that only has Soil Organic Carbon and Soil Organic Matter and Soil Carbon Stocks outcomes

Plotting RR for all agroforestry practices with Soil Organic Carbon and Soil Organic Matter and Soil Carbon Stocks outcomes

Plotting RR.pc.jen for all agroforestry practices with Soil Organic Carbon and Soil Organic Matter and Soil Carbon Stocks outcomes

Part 2: ERAAnalyze

Authors

Affiliations

Published

DOI

Loading necessary R packages and ERA data

The total ERA dataset

Agroforestry data within ERA

A tibble: 315 × 144

Groups: Code [18]

Splitting practice and product combinations

Agroforestry data: practices and sub-practices

Agroforestry data: sub-outcomes and practices

Agroforestry data: Crops

Agroforestry data: Geographic distribution

Countries where the agroforestry data comes from

ERA data on crop life span

Agroforestry data with only annual crops

Using the ERAAnalyze function

Define focus practices

Threshold: Minimum number of observations

CREATING THE NICE RIDGE LINE PLOTS OF EACH PRACTICE

Using the PrepareERA function

Calculating RR with ERAAnalyze

Threshold: Minimum number of studies

RR of agroforestry practices

Back to the agroforestry data: Density distributions

Results of ERAAnalyze

Omitting outliers from the analysed data

Identifiyng outliers

Visualising and assessing outliers

Omitting outliers

Impute missing values in the analysed data

Variations in RR for agroforestry practices

Variations in RR.pc.jen for agroforestry practices

Variations in RR for agroforestry practices accros Crop and Biomass Yield outcomes

Variations in RR for agroforestry practices accros other outcomes

Footnotes

References

Life.span <chr>	n <int>
annual	71845
NA	28074
perennial	6541
annual, biennial	967
biennial	243
biennial, perennial	207
annual, perennial	75