Hi world,
i create a box plot based on the sorted data
0 (19 times)
6.6
7.1
9.0
14.2
14.2
16.6
This creates an box from 0 to ~ 1.66. If I understand the source code right, the 1.66 should be an interpolation.
What I don't understand now is, why the upper whisker is drawn from 1.66 DOWN TO 0 (running through the box). Can anyone explain, why this should be correct (or what is wrong)?
(using TeeChart Pro 7.12)
Thanks for any helpfull hint
Martin
Unexpected BoxPlot
-
- Site Admin
- Posts: 14730
- Joined: Mon Jun 09, 2003 4:00 am
- Location: Banyoles, Catalonia
- Contact:
Best Regards,
Narcís Calvet / Development & Support Steema Software Avinguda Montilivi 33, 17003 Girona, Catalonia Tel: 34 972 218 797 http://www.steema.com |
Instructions - How to post in this forum |
Hi, Martin
Looking at your data I get the following results:
Now, from box plot definition (see
http://cnx.org/content/m10215/latest/ ) the plot is constructed as:
1) box, lower limit 25th pct, upper limit 75th pct, in this case, the box is drawn from 0.0 to 1.65.
2) median line, drawn at 0.0
3) lower whisker at lower adjacent point, in this case 0.0
4) upper whisker at upper adjacent point, in this case 0.0
The "problem" is different programs use different algorithms to calculate percentiles (IQR). If I take the data to Excel, I get IQR = 0.0. SPSS in this case returns 3.3 and TeeChart 1.65. I guess we could add different percentile calculation methods to existing code. We'll log this to our wish list for next TeeChart release. In the meantime the best workaround is to manually calculate necessary statistics outside TeeChart and pass calculated values to BoxPlot series.[/url]
Looking at your data I get the following results:
Code: Select all
median = 0.0
25th percentile = 0.0
75th percentile = 1.65
IQR = 1.65
lower inner fence = 25th PCT - 1.5*IQR = -2.475
upper inner fence = 75th PCT + 1.5*IQR = 4.125
lower adjacent point, defined as smallest value above lower inner fence. In this case, 0.0.
upper adjacent point, defined as largest value below upper inner fence. In this case 0.0.
http://cnx.org/content/m10215/latest/ ) the plot is constructed as:
1) box, lower limit 25th pct, upper limit 75th pct, in this case, the box is drawn from 0.0 to 1.65.
2) median line, drawn at 0.0
3) lower whisker at lower adjacent point, in this case 0.0
4) upper whisker at upper adjacent point, in this case 0.0
The "problem" is different programs use different algorithms to calculate percentiles (IQR). If I take the data to Excel, I get IQR = 0.0. SPSS in this case returns 3.3 and TeeChart 1.65. I guess we could add different percentile calculation methods to existing code. We'll log this to our wish list for next TeeChart release. In the meantime the best workaround is to manually calculate necessary statistics outside TeeChart and pass calculated values to BoxPlot series.[/url]
Marjan Slatinek,
http://www.steema.com
http://www.steema.com
Hi Marjan,
I found a couple of discussion about the calculation of Q1, Q3 (and with it IQR) in the web before starting this thread (e.g. http://www.maths.murdoch.edu.au/units/s ... smore.html). Even if this situation doesn't satisfy at all - I think your proposal should slove the problem from your point of view).
But if you calculate Q1 and Q3 using any type of interpolation - does the given definition for the adjacents correspond to the meaning of the whiskers? What is the INTERPRETATION of a boxplot where the upper adjacent is smaller than Q3? Shouldn't Q1 to Q3 be a subset of the lower and the upper adjacent point?
Reading your answer I guess that you check SPSS. How does SPSS paint the whisker? (I currently have no active licence)
I found a couple of discussion about the calculation of Q1, Q3 (and with it IQR) in the web before starting this thread (e.g. http://www.maths.murdoch.edu.au/units/s ... smore.html). Even if this situation doesn't satisfy at all - I think your proposal should slove the problem from your point of view).
But if you calculate Q1 and Q3 using any type of interpolation - does the given definition for the adjacents correspond to the meaning of the whiskers? What is the INTERPRETATION of a boxplot where the upper adjacent is smaller than Q3? Shouldn't Q1 to Q3 be a subset of the lower and the upper adjacent point?
Reading your answer I guess that you check SPSS. How does SPSS paint the whisker? (I currently have no active licence)
Hi, Martin.
Actually, I amd using NCSS. For percentile it uses the following formula:
This formula is slightly different from the one TeeChart uses so you get different percentile and IQR values. Here are the results I get in NCSS:
So, the box is drawn from 0 to 3.3, lower whisker is drawn at 0.0 and upper whisher at 7.1, which is ok, as by definition upper whisker position is less or equal to upper inner fence.
Actually, I amd using NCSS. For percentile it uses the following formula:
The 100pth percentile is computed as
Zp = (1-g)X[k1] + gX[k2]
where k1 equals the integer part of p(n+1), k2=k1+1, g is the fractional part of p(n+1), and X[k] is the kth observation when the data are sorted from lowest to highest.
This formula is slightly different from the one TeeChart uses so you get different percentile and IQR values. Here are the results I get in NCSS:
Code: Select all
median = 0.0
25th percentile = 0.0
75th percentile = 3.3
IQR = 3.3
lower inner fence = 25th PCT - 1.5*IQR = -4,95
upper inner fence = 75th PCT + 1.5*IQR = 8,25
lower adjacent point, defined as smallest value above lower inner fence. In this case, 0.0.
upper adjacent point, defined as largest value below upper inner fence. In this case 7.1.
Yes, I could limit the lower inner fence upper limit to Q1 and upper fence lower limit to Q3. But I'll have to check if this is valid.Shouldn't Q1 to Q3 be a subset of the lower and the upper adjacent point?
Marjan Slatinek,
http://www.steema.com
http://www.steema.com