(1/3) Linear model

Umut Caglar

2017-03-18

The Line Fit Function

This is a document invetigates details of linear model

Data generation

To simulate the results, we will go backwards and firstly generate some data to analyze. To add some randomness to the input data I will use some noise. The input of all package must be in the form of a data frame with at least 2 columns time and intensity.

sicegar::lineFitFormula generate a set of intensity values based on time, slope and intersection values supplied. So here we are generating a set of points that are on a line with a slope of 4 which intersects with x axis at -2.

time=seq(3,24,0.5)

#intensity with Noise
noise_parameter=7
intensity_noise=stats::runif(n = length(time),min = 0,max = 1)*noise_parameter
intensity=sicegar::lineFitFormula(time, slope=4, intersection=-2)
intensity=intensity+intensity_noise

dataInput=data.frame(intensity=intensity,time=time)

Data normalization

This is the first step. Data should be normalized before any fit. I.e time and intensity should be in between 0-1 interval.

There is a nuance

timeRatio=max(timeData); timeData=timeData/timeRatio
intensityMin = min(dataInput$intensity)
intensityMax = max(dataInput$intensity)
intensityRatio = intensityMax - intensityMin

intensityData=dataInput$intensity-intensityMin
intensityData=intensityData/intensityRatio

The normalization code is

normalizedInput = sicegar::normalizeData(dataInput = dataInput, 
                                         dataInputName = "Sample001")

Components of the normalization output

head(normalizedInput$timeIntensityData) # the normalized time and intensity data
##        time  intensity
## 1 0.1250000 0.00000000
## 2 0.1458333 0.01754799
## 3 0.1666667 0.02066282
## 4 0.1875000 0.01198153
## 5 0.2083333 0.05260639
## 6 0.2291667 0.06833362
print(normalizedInput$dataScalingParameters) # the normalization parameters that is needed to go back to original scale
##      timeRatio   intensityMin   intensityMax intensityRatio 
##       24.00000       15.58707       94.13809       78.55103
print(normalizedInput$dataInputName) # a useful feature to track the sample in all the process
## [1] "Sample001"

The figures of raw and normalized datasets

Line fit of the data

Now it is time to calculate the parameters by using sicegar::lineFitFunction()

parameterVector<-sicegar::lineFitFunction(dataInput = normalizedInput, tryCounter = 2)

# Where tryCounter is a tool usually provided by sicegar::fitFunction when the sicegar::lineFitFunction is called from sicegar::fitFunction. 

# If tryCounter==1 it took the  start position given by sicegar::fitFunction
# If tryCounter!=1 it generates a random start position from given interval

the function outputs a vector that gives information about multiple parameters

print(t(parameterVector))
##                                      [,1]          
## slope_N_Estimate                     "1.229064"    
## slope_Std_Error                      "0.01569797"  
## slope_t_value                        "78.29448"    
## slope_Pr_t                           "2.844619e-46"
## intersection_N_Estimate              "-0.1799898"  
## intersection_Std_Error               "0.009718127" 
## intersection_t_value                 "-18.52104"   
## intersection_Pr_t                    "1.580608e-21"
## residual_Sum_of_Squares              "0.02903872"  
## log_likelihood                       "95.94264"    
## AIC_value                            "-185.8853"   
## BIC_value                            "-180.6017"   
## isThisaFit                           "TRUE"        
## startVector.slope                    "-32.97216"   
## startVector.intersection             "50.24727"    
## dataScalingParameters.timeRatio      "24"          
## dataScalingParameters.intensityMin   "15.58707"    
## dataScalingParameters.intensityMax   "94.13809"    
## dataScalingParameters.intensityRatio "78.55103"    
## model                                "linaer"      
## intersection_Estimate                "1.448682"    
## slope_Estimate                       "4.022677"

Here is the brief explanations of the parameters that are given by sicegar::lineFitFunction (In different order then than the output vector of the sicegar::lineFitFunction)

These are the parameters of the normalization step:

They are the meta summary of the result parameters

Likelihood maximization algorithm starts from a random initiation point (if tryCounter!=1) and goes down the fitness space by a gradient decent algorithm. These parameters represent the start point of the gradient decent algorithm.

For each parameter that needs to fitted by LM algorithm; the algorithm gives a bunch of statistical parameters; including the estimated value of the parameter. Note: They are for normalized data.

They are the parameters associated with parameter “slope”

They are the parameters associated with parameter “intersection”

Here are the fit-parameters that are not related with individual variable that is fitted, but gives information about overal fit.

They are the parameters associated with the quality of the fit.

Final results that are relavent to most of the users

They are the fitted values after converting everything from normalized to un-normalized scale.

Check the results to see if the results are meaningfull

By using the intersection_Estimate, slope_Estimate parameters of the linefit and the time sequence that we already created we can calculate the intensity values by the help of sicegar::lineFitFormula(). We can draw the best line on top of our initial data.

intensityTheoretical=sicegar::lineFitFormula(time,
                                             slope=parameterVector$slope_Estimate,
                                             intersection=parameterVector$intersection_Estimate)
comparisonData=cbind(dataInput,intensityTheoretical)

ggplot2::ggplot(comparisonData)+
  ggplot2::geom_point(aes(x=time, y=intensity))+
  ggplot2::geom_line(aes(x=time,y=intensityTheoretical))+
  ggplot2::expand_limits(x = 0, y = 0)