Aggregate functions
Aggregate functions collapse multiple rows into a single result. They are used with GROUP BY to compute summaries per group, or without GROUP BY to compute a single result across all rows.
Counts rows. COUNT(*) counts all rows. COUNT(column) counts non-NULL values.
SELECT COUNT(*) AS total_rowsFROM my_namespace.sales_data
SELECT department, COUNT(*) AS dept_countFROM my_namespace.sales_dataGROUP BY departmentORDER BY dept_count DESCReturns the sum of values in a column.
SELECT SUM(total_amount) AS grand_totalFROM my_namespace.sales_data
SELECT department, SUM(total_amount) AS dept_totalFROM my_namespace.sales_dataGROUP BY departmentORDER BY dept_total DESCReturns the average of values in a column. Alias: mean.
SELECT AVG(total_amount) AS avg_amountFROM my_namespace.sales_data
SELECT department, AVG(total_amount) AS avg_amountFROM my_namespace.sales_dataGROUP BY departmentORDER BY avg_amount DESCReturns the minimum value. Works on numeric and string columns.
SELECT MIN(total_amount) AS min_amount, MIN(customer_id) AS first_customerFROM my_namespace.sales_data
SELECT department, MIN(total_amount) AS min_amountFROM my_namespace.sales_dataGROUP BY departmentReturns the maximum value. Works on numeric and string columns.
SELECT MAX(total_amount) AS max_amount, MAX(customer_id) AS last_customerFROM my_namespace.sales_data
SELECT department, MAX(total_amount) AS max_amountFROM my_namespace.sales_dataGROUP BY departmentApproximate aggregation functions produce statistically estimated results while using significantly less memory and compute than their exact counterparts. Use them when analyzing large datasets and an approximate result is acceptable.
Returns the approximate value at a given percentile using a T-Digest algorithm. The percentile parameter must be between 0.0 and 1.0 inclusive.
SELECT approx_percentile_cont(total_amount, 0.5) AS median, approx_percentile_cont(total_amount, 0.95) AS p95FROM my_namespace.sales_data
SELECT department, approx_percentile_cont(total_amount, 0.5) AS medianFROM my_namespace.sales_dataGROUP BY departmentORDER BY median DESCReturns the approximate weighted percentile. Rows are weighted by the weight column.
SELECT approx_percentile_cont_with_weight(unit_price, quantity, 0.5) AS weighted_medianFROM my_namespace.sales_dataWHERE unit_price IS NOT NULL AND quantity IS NOT NULLReturns the approximate median. Equivalent to approx_percentile_cont(column, 0.5).
SELECT approx_median(total_amount) AS median_amountFROM my_namespace.sales_data
SELECT department, approx_median(total_amount) AS medianFROM my_namespace.sales_dataGROUP BY departmentReturns the approximate count of distinct values using HyperLogLog.
SELECT approx_distinct(customer_id) AS unique_customersFROM my_namespace.sales_data
SELECT department, approx_distinct(customer_id) AS unique_customersFROM my_namespace.sales_dataGROUP BY departmentReturns the k most frequent values with their approximate counts.
SELECT approx_top_k(department, 5) AS top_departmentsFROM my_namespace.sales_dataReturns the sample variance.
SELECT var(total_amount) AS varianceFROM my_namespace.sales_data
SELECT department, var(total_amount) AS varianceFROM my_namespace.sales_dataGROUP BY departmentReturns the population variance.
SELECT var_pop(total_amount) AS pop_varianceFROM my_namespace.sales_dataReturns the sample standard deviation.
SELECT stddev(total_amount) AS std_devFROM my_namespace.sales_data
SELECT department, stddev(total_amount) AS std_devFROM my_namespace.sales_dataGROUP BY departmentReturns the population standard deviation.
SELECT stddev_pop(total_amount) AS pop_std_devFROM my_namespace.sales_dataReturns the sample covariance. Alias: covar.
SELECT covar_samp(total_amount, CAST(quantity AS DOUBLE)) AS covarianceFROM my_namespace.sales_dataWHERE total_amount IS NOT NULL AND quantity IS NOT NULLReturns the population covariance.
SELECT covar_pop(total_amount, CAST(quantity AS DOUBLE)) AS pop_covarianceFROM my_namespace.sales_dataWHERE total_amount IS NOT NULL AND quantity IS NOT NULLReturns the Pearson correlation coefficient between two columns.
SELECT corr(total_amount, CAST(quantity AS DOUBLE)) AS correlationFROM my_namespace.sales_dataWHERE total_amount IS NOT NULL AND quantity IS NOT NULLReturns the slope of the linear regression line.
SELECT regr_slope(total_amount, CAST(quantity AS DOUBLE)) AS slopeFROM my_namespace.sales_dataWHERE total_amount IS NOT NULL AND quantity IS NOT NULLReturns the y-intercept of the linear regression line.
SELECT regr_intercept(total_amount, CAST(quantity AS DOUBLE)) AS interceptFROM my_namespace.sales_dataWHERE total_amount IS NOT NULL AND quantity IS NOT NULLReturns the count of non-NULL pairs.
SELECT regr_count(total_amount, CAST(quantity AS DOUBLE)) AS pair_countFROM my_namespace.sales_dataWHERE total_amount IS NOT NULL AND quantity IS NOT NULLReturns the coefficient of determination (R-squared).
SELECT regr_r2(total_amount, CAST(quantity AS DOUBLE)) AS r_squaredFROM my_namespace.sales_dataWHERE total_amount IS NOT NULL AND quantity IS NOT NULLReturns the average of the independent variable (x) for non-NULL pairs.
SELECT regr_avgx(total_amount, CAST(quantity AS DOUBLE)) AS avg_qtyFROM my_namespace.sales_dataWHERE total_amount IS NOT NULL AND quantity IS NOT NULLReturns the average of the dependent variable (y) for non-NULL pairs.
SELECT regr_avgy(total_amount, CAST(quantity AS DOUBLE)) AS avg_amountFROM my_namespace.sales_dataWHERE total_amount IS NOT NULL AND quantity IS NOT NULLReturns the sum of squares of the independent variable.
SELECT regr_sxx(total_amount, CAST(quantity AS DOUBLE)) AS sxxFROM my_namespace.sales_dataWHERE total_amount IS NOT NULL AND quantity IS NOT NULLReturns the sum of squares of the dependent variable.
SELECT regr_syy(total_amount, CAST(quantity AS DOUBLE)) AS syyFROM my_namespace.sales_dataWHERE total_amount IS NOT NULL AND quantity IS NOT NULLReturns the sum of products of the paired variables.
SELECT regr_sxy(total_amount, CAST(quantity AS DOUBLE)) AS sxyFROM my_namespace.sales_dataWHERE total_amount IS NOT NULL AND quantity IS NOT NULLReturns the bitwise AND of all values in a group.
SELECT department, bit_and(quantity) AS and_resultFROM my_namespace.sales_dataWHERE quantity IS NOT NULLGROUP BY departmentReturns the bitwise OR of all values in a group.
SELECT department, bit_or(quantity) AS or_resultFROM my_namespace.sales_dataWHERE quantity IS NOT NULLGROUP BY departmentReturns the bitwise XOR of all values in a group.
SELECT department, bit_xor(quantity) AS xor_resultFROM my_namespace.sales_dataWHERE quantity IS NOT NULLGROUP BY departmentReturns true if all values in a group are true.
SELECT department, bool_and(is_completed) AS all_completedFROM my_namespace.sales_dataWHERE is_completed IS NOT NULLGROUP BY departmentReturns true if any value in a group is true.
SELECT department, bool_or(is_completed) AS any_completedFROM my_namespace.sales_dataWHERE is_completed IS NOT NULLGROUP BY departmentReturns the first value in a group according to the specified ordering.
SELECT department, first_value(customer_id ORDER BY total_amount ASC) AS lowest_spenderFROM my_namespace.sales_dataWHERE total_amount IS NOT NULLGROUP BY departmentReturns the last value in a group according to the specified ordering.
SELECT department, last_value(customer_id ORDER BY total_amount ASC) AS highest_spenderFROM my_namespace.sales_dataWHERE total_amount IS NOT NULLGROUP BY department