1. Home
  2. XLSForm Authoring
  3. How can I use the different calculate functions in XLSForm authoring?

How can I use the different calculate functions in XLSForm authoring?

A calculation is a purposive process that changes one or more inputs into results. Calculations can be included in your XLSForm as explained in this guide. They can be referenced like any other question in your survey worksheet. Please note that in order to see the result of a calculation, you MUST reference it because a calculation is a hidden field in an XLSForm. Referencing is done in a note question type or included in a label or hints in the format ${name_of_calculation_field}.

A comprehensive list of calculations can be found on the Open Data Kit website. We shall go in-depth for the following calculate functions:

Calculate Function Use
sum() Totals up a group of values.
count() Counts the number of occurrences of a specified value.
round() Used for cases where you want your calculation result to have a specific number of decimal places.
min() Returns the smallest value in a group of values.
max() Returns the largest value in a group of values.
join() Joins a list of values together.
coalesce() Returns the first stated value, if missing, then it returns the second one.
position(..) This will calculate the current position of the repeat index in repeat groups in the form.
if() This allows you to do value-based calculations in your form. It is used whereby you require a value x to be returned if a certain statement is true, otherwise, a value y should be returned.
concat() Helps you merge two or more responses into one.
mod() The modulo operation finds the remainder after the division of one number by another.

Survey Sheet Examples:

sum(), count() and round() functions  

Below is a survey sheet used to collect the names, gender, and ages of members in a household. The use of sum(), count(), and round() functions are depicted below using a repeat group. In this example, we want to get the total sum of ages, the number of household members, and the average age in the household.

type name label calculation
begin repeat members
text name What is the name of the household member?
select_one gender gender What is ${name}’s gender?
integer age How old is ${name}?
end repeat members
calculate sum_ages sum(${age})
calculate hh_number count(${name})
calculate average_age round(${sum_ages} div ${hh_number},1)

The calculate fields above performed the following:

  • sum(${age}) – This will add up all the ages entered in the repeat group.
  • count(${name}) – This will count the total number of names entered in the repeat group- hence enabling us to know the household size.
  • round(${sum_ages} div ${hh_number},1) – This will divide the sum of all ages by the household size and round off the answer to 1 decimal place – hence giving us the average age of the household.

min() and max() functions

In the members’ repeat group, the data collector entered each household member’s age. If we want to get the minimum age in the household, we will add an additional calculate question outside the repeat group, and we use the min() function. In this case, this will be min(${age}). For the maximum age, we use the max() function i.e max(${age}) and this will return the minimum age and maximum age in the household respectively.

type name label calculation
calculate min_age min(${age})
calculate max_age max(${age})

join() function

If you wanted to return a list of names entered, from a repeat group, you can use the join() function: join(‘separator’, ${field name}). In our example below, we can return the list of names entered in members repeat group into a comma-separated list of names for display using the join() function join(‘, ‘, ${name}), as shown below:

Note: The ${name} is a field within a repeat, it returns a list of names which allows you to use the join() function to create a comma-separated list of names.

type name label calculation
begin repeat members
text name What is the name of the household member?
select_one gender gender What is ${name}’s gender?
integer age How old is ${name}?
end repeat members
calculate join_names join(‘, ‘, ${name})

The function the join(‘, ‘, ${name}) combines the names entered in the repeat group, using a comma (,) as a separator e.g. Megan, Mary, Jane. You can replace the comma with other separators e.g a colon (:), hyphen (-), semicolon (;), or even a tilde (~).

coalesce() function

Coalesce() function is used when you have two alternatives and either of the options is used in case the other is missing with reference to the first one when both exist. If the household survey required you to enter both the first and last name of the interviewee separately, then use either name, depending on which was stated, with a preference for the first name over the last name, we would use the coalesce function. In our scenario, we want either the first name to be used, and if it is missing, then the last name can be used. The function would look like : coalesce(${firstname},${lastname})

type name label calculation
text firstname What is the first name of the interviewee?
text lastname What is the second name of the interviewee?
calculate coalesce_display coalesce(${firstname},${lastname})

position(..) function

This function will return the index of the repeat group. For example, if placed within a repeat group and the repeat group is in the second repeat, it returns the number 2 giving you the current index of the repeat group. It can be used outside a loop to give you the last index of the repeat, e.g. 4 if you only filled in 4 repeats.

Note: If position() is used instead of position(..), it will not work.

In the household survey, if I wanted to know how many members were questioned, I would place the function position(..) outside the repeat group, and this would return the index of the last completed repeat, which would suggest the number of members interviewed.

type name label calculation
begin repeat members
text name What is the name of the household member?
select_one gender gender What is ${name}’s gender?
integer age How old is ${name}?
end repeat members
calculate position_names position(..)

Position(..) index is also used in the indexed-repeat() function as a position index of a certain loop when pulling information from a repeat group. For more illustration on how the position(..) index is used inside a repeat group and how it is used in the indexed-repeat() function, read this help documentation on how to use repeats.

if(condition, a, b) function

if(condition, a,b) is a conditional function, meaning if the `condition` is true, it returns the value `a`, and if it is false it returns the value `b`.  Suppose we were interested in the food consumption of households in Nairobi, the below survey collects data on how much a household spent on food products they have used in the last 4 weeks.  

type name label calculation relevant
select_one yes_no used_sugar Did you purchase sugar within the last 4 weeks?
integer cost_sugar How much did you spend on sugar: selected(${used_sugar},’yes’)
select_one yes_no used_flour Did you purchase maize flour within the last 4 weeks?
integer cost_flour How much did you spend on maize flour: selected(${used_flour},’yes’)
select_one yes_no used_milk Did you purchase milk within the last 4 weeks?
integer cost_milk How much did you spend on milk: selected(${used_milk},’yes’)
select_one yes_no used_oil Did you purchase cooking oil within the last 4 weeks?
integer cost_oil How much did you spend on cooking oil: selected(${used_oil},’yes’)

If a household purchased a product, they will be asked how much they spent, otherwise they will move to the next question. This means some costs will be blank for those who did not use that product.

If we use this method to calculate how much the household spent on the products (${cost_sugar} + ${cost_flour} + ${cost_milk} + ${cost_oil}, and say ${cost_flour} did not have a value, the calculation will return a blank (NaN) value.

There is a workaround for this, and it involves using an IF statement to assign the value “0” to any unanswered question. If an answer is skipped without including an IF statement, then the “value” assigned to that skipped question is blank, and the blank will always be considered “less” than any other numeric value. Since a calculation can’t be performed on a blank value (a blank value isn’t 0, it is simply nothing), we must assign a value 0 if we want valid results to be returned.

For this case, these are the calculations that we will add to our survey sheet, right after the above group of questions:

type name label calculation
calculate sugar_cost if(${cost_sugar}>=0,${cost_sugar},0)
calculate flour_cost if(${cost_flour}>=0,${cost_flour},0)
calculate milk_cost if(${cost_milk}>=0,${cost_milk},0)
calculate oil_cost if(${cost_oil}>=0,${cost_oil},0)
calculate total_cost ${sugar_cost}+${flour_cost}+${milk_cost}+${oil_cost}

You can see that we have introduced calculations with this syntax: If (${cost_sugar}>=0, ${cost_sugar}, 0). This means that if cost_sugar is equal to or greater than 0, then the calculation will use the entered value, else if cost_sugar is left blank/not answered, then 0 will be used as a value for sugar_cost in the total_cost calculation.

Using the IF Statement, we can also determine how many items were purchased in that household. To achieve this, we’ll add the following calculate questions (seen below) to our survey sheet. They can be after the first group of questions asking about the items that were purchased or after the above set of questions trying to obtain the amount spent on the goods:

type name label calculation
calculate usedsugar if(selected(${used_sugar},’yes’),1,0)
calculate usedflour if(selected(${used_flour},’yes’),1,0)
calculate usedmilk if(selected(${used_milk},’yes’),1,0)
calculate usedoil if(selected(${used_oil},’yes’),1,0)
calculate itemsused ${usedsugar}+${usedflour}+ ${usedmilk}+${usedoil}

Above, we’ve used this syntax: if(selected(${used_sugar},’yes’),1,0) which returns a value of ‘1’ for a ‘yes’ response in used_sugar and a value of ‘0’ for a ‘no’ response. When we add these calculate questions, we will get the number of items purchased in that household.

After calculating the amount a household spent on buying the items, we may want to categorize the households into social classes by the amount they spend on these items. Our categories are:(Below 500 = Poor, 501 – 999 = Middle class, Above 1,000 = Upper class). We will use the IF Statement, at any point after the total amount spent on the items is calculated, to determine these social classes. A suitable IF statement would be:

type name label calculation
calculate hh_category if((${total_cost} <= 500), ‘Poor’, if((${total_cost} >= 501 and ${total_cost} <= 999), ‘Middle Class’,’Upper Class’))

The above series of if() functions is referred to as a nested if() function, and are basically if() functions within an if() function.

concat() function

The concat() function returns the concatenation of the string values using a separator between values. This function helps you merge two or more responses into one.

In our survey after collecting the first and last name of the interviewee, we would like to merge the first and second name of a person to obtain their full name:

type name label calculation
text name_1 What is the first name of the interviewee?
text name_2 What is the second name of the interviewee?
calculate fullname concat(${name_1},’ ‘,${name_2})

This will combine the two names, separating them with space e.g Jane Doe.

mod() function

Mod() function is used to check the divisibility of variables by a certain number. In the survey below, we need to ascertain whether a child had finished a complete set of 3 months before a certain treatment is administered.

type name label calculation
integer age_child What is your child’s age in months?
calculate evaluate_3 (${age_child} mod 3)=0
note display_result The child has finished a set of 3 months: ${evaluate_3}

The function (${age_child} mod 3)=0 checks the divisibility of the months by 3 to ensure it is divisible by 3. If it is, it would display the word true, and if not, it would display the word false.

Related Articles

Need Support?
Can't find the answer you're looking for?
Contact Support