A calculation is a purposive process that changes one or more inputs into results. Calculations can be included in your XLSForm as explained in this guide. They can be referenced like any other question in your survey worksheet. Please note that in order to see the result of a calculation, you MUST reference it because a calculation is a hidden field in an XLSForm. Referencing is done in a note question type or included in a label or hints in the format ${name_of_calculation_field}.
A comprehensive list of calculations can be found on the Open Data Kit website. We shall go in-depth for the following calculate functions:
Calculate Function | Use |
sum() | Totals up a group of values. |
count() | Counts the number of occurrences of a specified value. |
round() | Used for cases where you want your calculation result to have a specific number of decimal places. |
min() | Returns the smallest value in a group of values. |
max() | Returns the largest value in a group of values. |
join() | Joins a list of values together. |
coalesce() | Returns the first stated value, if missing, then it returns the second one. |
position(..) | This will calculate the current position of the repeat index in repeat groups in the form. |
if() | This allows you to do value-based calculations in your form. It is used whereby you require a value x to be returned if a certain statement is true, otherwise, a value y should be returned. |
concat() | Helps you merge two or more responses into one. |
mod() | The modulo operation finds the remainder after the division of one number by another. |
Survey Sheet Examples:
sum(), count() and round() functions
Below is a survey sheet used to collect the names, gender, and ages of members in a household. The use of sum(), count(), and round() functions are depicted below using a repeat group. In this example, we want to get the total sum of ages, the number of household members, and the average age in the household.
type | name | label | calculation |
begin repeat | members | ||
text | name | What is the name of the household member? | |
select_one gender | gender | What is ${name}’s gender? | |
integer | age | How old is ${name}? | |
end repeat | members | ||
calculate | sum_ages | sum(${age}) | |
calculate | hh_number | count(${name}) | |
calculate | average_age | round(${sum_ages} div ${hh_number},1) |
The calculate fields above performed the following:
- sum(${age}) – This will add up all the ages entered in the repeat group.
- count(${name}) – This will count the total number of names entered in the repeat group- hence enabling us to know the household size.
- round(${sum_ages} div ${hh_number},1) – This will divide the sum of all ages by the household size and round off the answer to 1 decimal place – hence giving us the average age of the household.
min() and max() functions
In the members’ repeat group, the data collector entered each household member’s age. If we want to get the minimum age in the household, we will add an additional calculate question outside the repeat group, and we use the min() function. In this case, this will be min(${age}). For the maximum age, we use the max() function i.e max(${age}) and this will return the minimum age and maximum age in the household respectively.
type | name | label | calculation |
calculate | min_age | min(${age}) | |
calculate | max_age | max(${age}) |
join() function
If you wanted to return a list of names entered, from a repeat group, you can use the join() function: join(‘separator’, ${field name}). In our example below, we can return the list of names entered in members repeat group into a comma-separated list of names for display using the join() function join(‘, ‘, ${name}), as shown below:
Note: The ${name} is a field within a repeat, it returns a list of names which allows you to use the join() function to create a comma-separated list of names.
type | name | label | calculation |
begin repeat | members | ||
text | name | What is the name of the household member? | |
select_one gender | gender | What is ${name}’s gender? | |
integer | age | How old is ${name}? | |
end repeat | members | ||
calculate | join_names | join(‘, ‘, ${name}) |
The function the join(‘, ‘, ${name}) combines the names entered in the repeat group, using a comma (,) as a separator e.g. Megan, Mary, Jane. You can replace the comma with other separators e.g a colon (:), hyphen (-), semicolon (;), or even a tilde (~).
coalesce() function
Coalesce() function is used when you have two alternatives and either of the options is used in case the other is missing with reference to the first one when both exist. If the household survey required you to enter both the first and last name of the interviewee separately, then use either name, depending on which was stated, with a preference for the first name over the last name, we would use the coalesce function. In our scenario, we want either the first name to be used, and if it is missing, then the last name can be used. The function would look like : coalesce(${firstname},${lastname})
type | name | label | calculation |
text | firstname | What is the first name of the interviewee? | |
text | lastname | What is the second name of the interviewee? | |
calculate | coalesce_display | coalesce(${firstname},${lastname}) |
position(..) function
This function will return the index of the repeat group. For example, if placed within a repeat group and the repeat group is in the second repeat, it returns the number 2 giving you the current index of the repeat group. It can be used outside a loop to give you the last index of the repeat, e.g. 4 if you only filled in 4 repeats.
Note: If position() is used instead of position(..), it will not work.
In the household survey, if I wanted to know how many members were questioned, I would place the function position(..) outside the repeat group, and this would return the index of the last completed repeat, which would suggest the number of members interviewed.
type | name | label | calculation |
begin repeat | members | ||
text | name | What is the name of the household member? | |
select_one gender | gender | What is ${name}’s gender? | |
integer | age | How old is ${name}? | |
end repeat | members | ||
calculate | position_names | position(..) |
Position(..) index is also used in the indexed-repeat() function as a position index of a certain loop when pulling information from a repeat group. For more illustration on how the position(..) index is used inside a repeat group and how it is used in the indexed-repeat() function, read this help documentation on how to use repeats.
if(condition, a, b) function
if(condition, a,b) is a conditional function, meaning if the `condition` is true, it returns the value `a`, and if it is false it returns the value `b`. Suppose we were interested in the food consumption of households in Nairobi, the below survey collects data on how much a household spent on food products they have used in the last 4 weeks.
type | name | label | calculation | relevant |
select_one yes_no | used_sugar | Did you purchase sugar within the last 4 weeks? | ||
integer | cost_sugar | How much did you spend on sugar: | selected(${used_sugar},’yes’) | |
select_one yes_no | used_flour | Did you purchase maize flour within the last 4 weeks? | ||
integer | cost_flour | How much did you spend on maize flour: | selected(${used_flour},’yes’) | |
select_one yes_no | used_milk | Did you purchase milk within the last 4 weeks? | ||
integer | cost_milk | How much did you spend on milk: | selected(${used_milk},’yes’) | |
select_one yes_no | used_oil | Did you purchase cooking oil within the last 4 weeks? | ||
integer | cost_oil | How much did you spend on cooking oil: | selected(${used_oil},’yes’) |
If a household purchased a product, they will be asked how much they spent, otherwise they will move to the next question. This means some costs will be blank for those who did not use that product.
If we use this method to calculate how much the household spent on the products (${cost_sugar} + ${cost_flour} + ${cost_milk} + ${cost_oil}, and say ${cost_flour} did not have a value, the calculation will return a blank (NaN) value.
There is a workaround for this, and it involves using an IF statement to assign the value “0” to any unanswered question. If an answer is skipped without including an IF statement, then the “value” assigned to that skipped question is blank, and the blank will always be considered “less” than any other numeric value. Since a calculation can’t be performed on a blank value (a blank value isn’t 0, it is simply nothing), we must assign a value 0 if we want valid results to be returned.
For this case, these are the calculations that we will add to our survey sheet, right after the above group of questions:
type | name | label | calculation |
calculate | sugar_cost | if(${cost_sugar}>=0,${cost_sugar},0) | |
calculate | flour_cost | if(${cost_flour}>=0,${cost_flour},0) | |
calculate | milk_cost | if(${cost_milk}>=0,${cost_milk},0) | |
calculate | oil_cost | if(${cost_oil}>=0,${cost_oil},0) | |
calculate | total_cost | ${sugar_cost}+${flour_cost}+${milk_cost}+${oil_cost} |
You can see that we have introduced calculations with this syntax: If (${cost_sugar}>=0, ${cost_sugar}, 0). This means that if cost_sugar is equal to or greater than 0, then the calculation will use the entered value, else if cost_sugar is left blank/not answered, then 0 will be used as a value for sugar_cost in the total_cost calculation.
Using the IF Statement, we can also determine how many items were purchased in that household. To achieve this, we’ll add the following calculate questions (seen below) to our survey sheet. They can be after the first group of questions asking about the items that were purchased or after the above set of questions trying to obtain the amount spent on the goods:
type | name | label | calculation |
calculate | usedsugar | if(selected(${used_sugar},’yes’),1,0) | |
calculate | usedflour | if(selected(${used_flour},’yes’),1,0) | |
calculate | usedmilk | if(selected(${used_milk},’yes’),1,0) | |
calculate | usedoil | if(selected(${used_oil},’yes’),1,0) | |
calculate | itemsused | ${usedsugar}+${usedflour}+ ${usedmilk}+${usedoil} |
Above, we’ve used this syntax: if(selected(${used_sugar},’yes’),1,0) which returns a value of ‘1’ for a ‘yes’ response in used_sugar and a value of ‘0’ for a ‘no’ response. When we add these calculate questions, we will get the number of items purchased in that household.
After calculating the amount a household spent on buying the items, we may want to categorize the households into social classes by the amount they spend on these items. Our categories are:(Below 500 = Poor, 501 – 999 = Middle class, Above 1,000 = Upper class). We will use the IF Statement, at any point after the total amount spent on the items is calculated, to determine these social classes. A suitable IF statement would be:
type | name | label | calculation |
calculate | hh_category | if((${total_cost} <= 500), ‘Poor’, if((${total_cost} >= 501 and ${total_cost} <= 999), ‘Middle Class’,’Upper Class’)) |
The above series of if() functions is referred to as a nested if() function, and are basically if() functions within an if() function.
concat() function
The concat() function returns the concatenation of the string values using a separator between values. This function helps you merge two or more responses into one.
In our survey after collecting the first and last name of the interviewee, we would like to merge the first and second name of a person to obtain their full name:
type | name | label | calculation |
text | name_1 | What is the first name of the interviewee? | |
text | name_2 | What is the second name of the interviewee? | |
calculate | fullname | concat(${name_1},’ ‘,${name_2}) |
This will combine the two names, separating them with space e.g Jane Doe.
mod() function
Mod() function is used to check the divisibility of variables by a certain number. In the survey below, we need to ascertain whether a child had finished a complete set of 3 months before a certain treatment is administered.
type | name | label | calculation |
integer | age_child | What is your child’s age in months? | |
calculate | evaluate_3 | (${age_child} mod 3)=0 | |
note | display_result | The child has finished a set of 3 months: ${evaluate_3} |
The function (${age_child} mod 3)=0 checks the divisibility of the months by 3 to ensure it is divisible by 3. If it is, it would display the word true, and if not, it would display the word false.