This is a document invetigates details of linear model
To simulate the results, we will go backwards and firstly generate some data to analyze. To add some randomness to the input data I will use some noise. The input of all package must be in the form of a data frame with at least 2 columns time and intensity.
sicegar::lineFitFormula
generate a set of intensity values based on time, slope and intersection values supplied. So here we are generating a set of points that are on a line with a slope of 4 which intersects with x axis at -2.
time=seq(3,24,0.5)
#intensity with Noise
noise_parameter=7
intensity_noise=stats::runif(n = length(time),min = 0,max = 1)*noise_parameter
intensity=sicegar::lineFitFormula(time, slope=4, intersection=-2)
intensity=intensity+intensity_noise
dataInput=data.frame(intensity=intensity,time=time)
This is the first step. Data should be normalized before any fit. I.e time and intensity should be in between 0-1 interval.
There is a nuance
timeRatio=max(timeData); timeData=timeData/timeRatio
intensityMin = min(dataInput$intensity)
intensityMax = max(dataInput$intensity)
intensityRatio = intensityMax - intensityMin
intensityData=dataInput$intensity-intensityMin
intensityData=intensityData/intensityRatio
The normalization code is
normalizedInput = sicegar::normalizeData(dataInput = dataInput,
dataInputName = "Sample001")
Components of the normalization output
head(normalizedInput$timeIntensityData) # the normalized time and intensity data
## time intensity
## 1 0.1250000 0.00000000
## 2 0.1458333 0.01754799
## 3 0.1666667 0.02066282
## 4 0.1875000 0.01198153
## 5 0.2083333 0.05260639
## 6 0.2291667 0.06833362
print(normalizedInput$dataScalingParameters) # the normalization parameters that is needed to go back to original scale
## timeRatio intensityMin intensityMax intensityRatio
## 24.00000 15.58707 94.13809 78.55103
print(normalizedInput$dataInputName) # a useful feature to track the sample in all the process
## [1] "Sample001"
Now it is time to calculate the parameters by using sicegar::lineFitFunction()
parameterVector<-sicegar::lineFitFunction(dataInput = normalizedInput, tryCounter = 2)
# Where tryCounter is a tool usually provided by sicegar::fitFunction when the sicegar::lineFitFunction is called from sicegar::fitFunction.
# If tryCounter==1 it took the start position given by sicegar::fitFunction
# If tryCounter!=1 it generates a random start position from given interval
the function outputs a vector that gives information about multiple parameters
print(t(parameterVector))
## [,1]
## slope_N_Estimate "1.229064"
## slope_Std_Error "0.01569797"
## slope_t_value "78.29448"
## slope_Pr_t "2.844619e-46"
## intersection_N_Estimate "-0.1799898"
## intersection_Std_Error "0.009718127"
## intersection_t_value "-18.52104"
## intersection_Pr_t "1.580608e-21"
## residual_Sum_of_Squares "0.02903872"
## log_likelihood "95.94264"
## AIC_value "-185.8853"
## BIC_value "-180.6017"
## isThisaFit "TRUE"
## startVector.slope "-32.97216"
## startVector.intersection "50.24727"
## dataScalingParameters.timeRatio "24"
## dataScalingParameters.intensityMin "15.58707"
## dataScalingParameters.intensityMax "94.13809"
## dataScalingParameters.intensityRatio "78.55103"
## model "linaer"
## intersection_Estimate "1.448682"
## slope_Estimate "4.022677"
Here is the brief explanations of the parameters that are given by sicegar::lineFitFunction
(In different order then than the output vector of the sicegar::lineFitFunction
)
These are the parameters of the normalization step:
dataScalingParameters.timeRatio
: Maximum of raw time datadataScalingParameters.intensityMin
: Minimum of raw intensity datadataScalingParameters.intensityMax
: Maximum of raw intensity datadataScalingParameters.intensityRatio
: Maximum - Minimum of intensity dataThey are the meta summary of the result parameters
model
: Gives the used model for fittingisThisaFit
: FALSE means there is not any successful fit. TRUE means there is at least one successful fitLikelihood maximization algorithm starts from a random initiation point (if tryCounter!=1) and goes down the fitness space by a gradient decent algorithm. These parameters represent the start point of the gradient decent algorithm.
startVector.slope
: Slope value of the initiation pointstartVector.intersection
: Intersection value of the initiation pointFor each parameter that needs to fitted by LM algorithm; the algorithm gives a bunch of statistical parameters; including the estimated value of the parameter. Note: They are for normalized data.
They are the parameters associated with parameter “slope”
slope_N_Estimate
: here N stand for the slope in the normalized scaleslope_Std_Error
slope_t_value
slope_Pr_t
They are the parameters associated with parameter “intersection”
intersection_N_Estimate
here N stand for the intersection in the normalized scaleintersection_Std_Error
intersection_t_value
intersection_Pr_t
Here are the fit-parameters that are not related with individual variable that is fitted, but gives information about overal fit.
They are the parameters associated with the quality of the fit.
residual_Sum_of_Squares
: Small value indicate better fitlog_likelihood
: Higher value indicate a better fitAIC_value
: Smaller value indicate a better fitBIC_value
: Smaller value indicate a better fitFinal results that are relavent to most of the users
They are the fitted values after converting everything from normalized to un-normalized scale.
intersection_Estimate
: Intersection estimate for the raw dataslope_Estimate
: Slope estimate for the raw dataBy using the intersection_Estimate
, slope_Estimate
parameters of the linefit and the time sequence that we already created we can calculate the intensity values by the help of sicegar::lineFitFormula()
. We can draw the best line on top of our initial data.
intensityTheoretical=sicegar::lineFitFormula(time,
slope=parameterVector$slope_Estimate,
intersection=parameterVector$intersection_Estimate)
comparisonData=cbind(dataInput,intensityTheoretical)
ggplot2::ggplot(comparisonData)+
ggplot2::geom_point(aes(x=time, y=intensity))+
ggplot2::geom_line(aes(x=time,y=intensityTheoretical))+
ggplot2::expand_limits(x = 0, y = 0)