Ngl.regline
Calculates the linear regression coefficient between two series.
Available in version 1.3.0 or later.
Prototype
rc,attrs = Ngl.regline(x, y, return_info=True)
Arguments
x, yOne-dimensional numpy or masked arrays of the same length. If either one if these is not a masked array, then a fill value of 1e20 will be used for that array.
return_info=TrueAn optional logical that indicates whether additional calculations should be returned as part of a list (default is True).
Return values
rc, attrsA scalar is returned. Some additional values will be returned as a dictionary if return_info is set to True. See description below.
Description
Ngl.regline computes the information needed to construct a regression line: regression coefficient (trend, slope,...) and the average of the x and y values. Ngl.regline is designed to work with one-dimensional x and y arrays. Missing data are allowed.
Ngl.regline also returns the following values as a separate dictionary if return_info is True:
- xave (scalar)
- average of x
- yave (scalar)
- average of y
- tval (scalar)
- t-statistic (assuming null-hypothesis)
- rstd (scalar)
- standard error of the regression coefficient
- yintercept (scalar)
- y-intercept at x=0
- nptxy (scalar, integer)
- number of points used
Examples
Example 1
The following example was taken from:
Brownlee Statistical Theory and Methodology J Wiley 1965 pgs: 342-346 QA276 .B77The regression line information for the example below is: (a) rc=0.9746, (b) tval=38.7, (c) nptxy=18 which yields 16 degrees of freedom (df=nptxy-2). To test the null hypothesis (i.e., rc=0) at the two-tailed 95% level, we note that t(16) is 2.120 (table look-up: 0.975). Clearly, the calculated t-statistic greatly exceeds 2.120 so the null hypothesis is rejected at the 5% level.
Rather than a table lookup, the following could be used to calculate the actual significance level.
alpha = Ngl.betainc(df/(df+attrs["tval"]^2), df/2.0, 0.5)or, alternatively,
prob = 1 - Ngl.betainc(df/(df+attrs["tval"]^2), df/2.0, 0.5)Note that "Ngl.betainc" hasn't been implemented yet. The example series are:
x = [ 1190.,1455.,1550.,1730.,1745.,1770., \ 1900.,1920.,1960.,2295.,2335.,2490., \ 2720.,2710.,2530.,2900.,2760.,3010. ] y = [ 1115.,1425.,1515.,1795.,1715.,1710., \ 1830.,1920.,1970.,2300.,2280.,2520., \ 2630.,2740.,2390.,2800.,2630.,2970. ] rc,attrs = Ngl.regline (x,y) print rc print attrs # Note use of dictionary items df = attrs["nptxy"]-2 tval = attrs["tval"] yint = attrs["yintercept"] #prob = (1 - Ngl.betainc(df/(df+tval^2), df/2.0, 0.5) ) #yReg = rc*x + yint #print "prob",prob #print yRegThe first two print statements will yield:
0.974561429694 {'xave': 2165.0, 'rstd': 0.025154607619252603, 'yintercept': 15.352282489361187, 'tval': 38.74285953673624, 'yave': 2125.2777777777778, 'nptxy': 18}
Note 1: The above assumes that all the points are independent. If this is not the case, then the number used to test for significance should be less.
Note 2: To construct 95% confidence limits for the hypothesis that the regression coefficient is one (i.e., rc=1) :
- As noted above, the t for 0.975 and 16 degrees of freedom is 2.120 [table look-up].
- attrs["rstd"] * 2.12 = 0.053. This yields 95% confidence limits of (0.97-0.053) < 0.97 < (0.97+0.053) or (0.92 to 1.03). Thus, the hypothesis that rc=1 can not be rejected.