Ngl.regline
Calculates the linear regression coefficient between two series.
Available in version 1.3.0 or later.
Prototype
rc,attrs = Ngl.regline(x, y, return_info=True)
Arguments
x, yOne-dimensional numpy or masked arrays of the same length. If either one if these is not a masked array, then a fill value of 1e20 will be used for that array.
return_info=TrueAn optional logical that indicates whether additional calculations should be returned as part of a list (default is True).
Return values
rc, attrsA scalar is returned. Some additional values will be returned as a dictionary if return_info is set to True. See description below.
Description
Ngl.regline computes the information needed to construct a regression line: regression coefficient (trend, slope,...) and the average of the x and y values. Ngl.regline is designed to work with one-dimensional x and y arrays. Missing data are allowed.
Ngl.regline also returns the following values as a separate dictionary if return_info is True:
- xave (scalar)
- average of x
- yave (scalar)
- average of y
- tval (scalar)
- t-statistic (assuming null-hypothesis)
- rstd (scalar)
- standard error of the regression coefficient
- yintercept (scalar)
- y-intercept at x=0
- nptxy (scalar, integer)
- number of points used
Examples
Example 1
The following example was taken from:
Brownlee
Statistical Theory and Methodology
J Wiley 1965 pgs: 342-346 QA276 .B77
The regression line information for the example below is: (a)
rc=0.9746, (b) tval=38.7, (c) nptxy=18 which yields 16 degrees of
freedom (df=nptxy-2). To test the null hypothesis
(i.e., rc=0) at the two-tailed 95% level, we note that t(16) is 2.120
(table look-up: 0.975). Clearly, the calculated t-statistic greatly
exceeds 2.120 so the null hypothesis is rejected at the 5%
level.
Rather than a table lookup, the following could be used to calculate the actual significance level.
alpha = Ngl.betainc(df/(df+attrs["tval"]^2), df/2.0, 0.5)
or, alternatively,
prob = 1 - Ngl.betainc(df/(df+attrs["tval"]^2), df/2.0, 0.5)
Note that "Ngl.betainc" hasn't been implemented yet.
The example series are:
x = [ 1190.,1455.,1550.,1730.,1745.,1770., \
1900.,1920.,1960.,2295.,2335.,2490., \
2720.,2710.,2530.,2900.,2760.,3010. ]
y = [ 1115.,1425.,1515.,1795.,1715.,1710., \
1830.,1920.,1970.,2300.,2280.,2520., \
2630.,2740.,2390.,2800.,2630.,2970. ]
rc,attrs = Ngl.regline (x,y)
print rc
print attrs
# Note use of dictionary items
df = attrs["nptxy"]-2
tval = attrs["tval"]
yint = attrs["yintercept"]
#prob = (1 - Ngl.betainc(df/(df+tval^2), df/2.0, 0.5) )
#yReg = rc*x + yint
#print "prob",prob
#print yReg
The first two print statements will yield:
0.974561429694
{'xave': 2165.0, 'rstd': 0.025154607619252603, 'yintercept': 15.352282489361187,
'tval': 38.74285953673624, 'yave': 2125.2777777777778, 'nptxy': 18}
Note 1: The above assumes that all the points are independent. If this is not the case, then the number used to test for significance should be less.
Note 2: To construct 95% confidence limits for the hypothesis that the regression coefficient is one (i.e., rc=1) :
- As noted above, the t for 0.975 and 16 degrees of freedom is 2.120 [table look-up].
- attrs["rstd"] * 2.12 = 0.053. This yields 95% confidence limits of (0.97-0.053) < 0.97 < (0.97+0.053) or (0.92 to 1.03). Thus, the hypothesis that rc=1 can not be rejected.