Categories
R Statistic

Relation between median and mean using R

We are going to work mainly with RR is a free software environment for statistical computing and graphics. When the data have a symmetrical distribution without outliers, the mean and the sample median are very close. 


consume<-c(6.9, 6.3, 6.2, 6.5 ,6.4, 6.8, 6.6)
data.frame(mean=mean(consume),median=median(consume)) 

> data.frame(mean=mean(consume),median=median(consume))
mean median
1 6.528571 6.5

However, when the distributions are asymmetrical the measure and the median will not be
Coincident:

  • Right asymmetry: the mean is greater than the median
  • Left asymmetry: the mean is less than the median

 This is another example when we probe this. We are going to use this datas in R:

 salaries=c(903, 2684, 550, 1571, 1190, 857, 547, 2401, 1257, 411, 3500, 284, 7537, 1666, 604, 692, 450, 770, 3013, 566)

 This is a list that show monthly salary of 20 workers of one company. We can calculate median and mean using this commands in R. 

mean(salaries) –> 1572.65
median(salaries) –> 880
hist(salaries)
abline(v=c(mean(salarios),median(salarios)), col=c(“blue”,”red”))

We get the results that you see in Fig. 1.  

r statistic
Figure 1. Histogram of salaries

Is important remark that if we use mean salary can be misleading because 70% has a salary lower than the average salary.
mean(salaries<mean(salaries)) 

[1] 0.7

Also we can use a dot chart which is showed in Fig. 2.

r statistic
Figure 2. Dotchart

 The code is:


dotchart(salaries,pch=16,xlab=”diameter”)
abline(v=mean(salaries),col=’red’,lwd=2)
abline(v=median(salaries),col=’blue’,lty=2,lwd=2)
legend(“bottomright”,c(“mean”,”median”),
       col=c(“red”,”blue”),lty=c(1,2),lwd=c(2,2),box.lty=0,cex=1.5)

Therefore is no the same median and mean. You must to be very concrete and use them wisely. There are several kinds of mean in mathematics, especially in statistics. For a data set, it may be thought use one o another.

Leave a Reply