Changes TDMR package

V 0.4.0 – 21.11.2012

Simplified the code / the workflow (10/2012):

·         Improved modularity: different parts of TDMR construct different objects:

o   the user (or tdmDefaultsFill) constructs tdm

o   envT = tdmEnvTMakeNew(tdm) constructs envT

o   envT = tdmEnvTAddBstRes(envT,fileRData) augments envT by bstGrid, resGrid from .RData file (if needed)

o   envT = tdmBigLoop(envT,spotStep) does (tuning and) unbiased runs.

·         tdmBigLoop is now the new function which should supersede the now deprecated tdmCompleteEval:

o   only two parameters: envT, spotStep

o   since envT is passed instead of tdm, we are more flexible which input to send into tdmBigLoop. Example: If spotStep==”rep”, tdmBigLoop requires the data frames bst and res from prior tuning runs à this is not possible via tdm, but can easily be done via envT$bstGrid, envT$resGrid

o   simplified tdm$fileMode-section (no .res or .bst-file writing & copying any more, makes the code much simpler to understand!!) à bst and res are returned / passed via envT

o   tdm$fileMode=FALSE is now the default. tdm$fileMode=TRUE is deprecated and leads only to writing of .fin and .exp files (these files are not very necessary, since we store envT with theFinals in .RData file)

o   always envT$spotConfig$spot.fileMode=FALSE 

·         tdmCompleteEval is still there for downward compatibility, but it is deprecated:

o   it writes .res, .bst, .fin, .exp files, if tdm$fileMode=T

o   tdmCompleteEval has other calling arguments

o   tdmComplteeEval now sets envT$spotConfig$spot.fileMode=tdm$fileMode (was done before in tdmDispatchTuner)

o   tdmCompleteEval should become obsolete, if all demos / user files are changed to tdmBigLoop (but we keep it perhaps for downward compatibility)

·         Simplified envT$result, which contained 3x opts !! à now only 1x opts + accessor function Opts().

·         Reformulated the tdm$filemode-sections in tdmCompleteEval / tdmBigLoop: The normal case is now tdm$fileMode==FALSE.

·         Abandonned the writing of <name>_train.csv.SRF.<target>.RData, <name>_train.log and <name>_train_eval.csv when tdm$oFileMode==FALSE, since this may be conflicting if we do certain parallel tasks.

·         tdmGetObj is now marked as deprecated (we use it however in unbiasedRun to ensure downward compatibility).

·         Renamed “Test2” à “Vali2” and other naming issues around “Test” and “Validation”. Made variable names more meaningful: VALI, if connected with validation data, TST, if connected with test data.

Simplified the code / the workflow (06/2012):

·         Simplified start of parallel execution: no need for sourcing start.tdm.r (except if you want the R developer sources), all sfExport-related stuff is now in  function prepareParallelExec in tdmBigLoop.r.

·         Simplified design mapping: only one function pair {tdmMapDesLoad, tdmMapDesApply} and no longer tdmMapDesSpot, no makeTdmDesSpot. The maps map (from tdmMapDesign.csv) and mapUser (from userMapDesign.csv) are now stored in list tdm.

·         Simplified the triangle startFromSource.r, start.tdm.r, source.tdm.r: startFromSource.r  and start.tdm.r are now only needed for the developer (if you want to start from R sources). They are NO LONGER  needed if the normal TDMR user wants to initiate parallel execution (all sfExport’s and the like are now done in function prepareParallelExec in tdmBigLoop.r, which is called if tdm$parallelCPUs>1).

·         Warning: if tdm$umode=”TST” *and* opts$TST.kind=”col”, then tdmSplitTestData will tag all records with opts$TST.col!=0 as test data. Later on, tdmStartSpot will hand only the data with opts$TST.col==0 to main_TASK, and this will separate into vali and train data acc. to opts$TST.col again à all data are train, no vali data (this is o.k. for opts$MOD.method==”RF”, but may lead to strange results in other cases). – How to fix:

o   Make a check on number of vali records for cases opts$MOD.method!=”RF”.

o   Issue in tdmBigLoop/tdmCompleteEval a warning, if tdm$umode=”TST” and opts$TST.kind=”col” and opts$MOD.method!=”RF”.

Docu TDMR and Demos TDMR:

·         added TDMR-tutorial.html, moved the section “Example usage” in there.

·         added a FAQ section (“How to”) in TDMR-tutorial.html

·         added two appendices on tdmMapDesign.csv and on elements of opts in TDM-docu.html

·         adapted all documentation & demos to the new tdmBigLoop

·         added citation ROCR

Modified the function tdmModAdjustCutoff:

·         Extended that either parameter CUTOFF1, … , CUTOFFn can be the missing one.

·         Guaranteed that the dependent CUTOFF can never become negative when enforcing the constraint.

·         If tdmModAdjustCutoff is entered with a cutoff with length(cutoff)==n.class-1, then cutoff[n.class] becomes the dependent CUTOFF.

·         The old function tdmMapCutoff is now disabled, everything in tdmModAdjustCutoff.

Fixed a bug: tdmPlotResMeta could crash, if not all .conf files had the same tuning pars. Fix: Now the x- and y-selectors in twiddler-interface are the union of all tuning pars. If a x- or y-selection is not part of the specific tuning pars for the selected .conf, issue an error message box and do not start spot.

Added skipIncomplete-part in tdmPlotResMeta(). Fixed a bug (no mergedData) concerning nSkip in tdmPlotResMeta().

Fixed a bug concerning opts$READ.NROW: now this is applied also when loading <filetest>.RData

For regression: new option opts$rgain.type=”made” (mean absolute deviation)

Extended opts$rgain.string to work also for the regression options, adapt column names in theFinals accordingly.

tdmOptsDefaultsSet returns now in opts an object of class “tdmOpts”. Checks for the right class of opts in central TDMR files.

tdmRegressLoop.r, tdmClassifyLoop.r: More accurate averaging of evaluation measures for regression CV case, new variable ‘result$predictions’.

Bug fix ‘nfold=max(cvi,1)’ to have not nfold=0 in the special case that all records in dset are training cases (zero validation cases)

Saving envT: parameter savePredictions (default =FALSE) allows to decide whether result$predictions and result$predProbList are saved to .RData.

Some small bug fixes concerning ‘predProb’ and ‘predictions’ for the case opts$ncopies>0. predProb is needed by tdmModConfmat, which is called from tdmClassify (in case opts$rgain.type=”ar*”, this will call tdmROCR_calc with predProb). predProbList is needed by tdmROCR.TDMclassifier.

Added in tdmSortedRFimport the option opts$SRF.scale to use scaled or unscaled importance.

Bug fix in tdmClassify: build EVALa correctly also in cases where nrow(d_test)==0 à set cm.test$* to NA and not cm.test to NULL.

 

V 0.3.1 – 01.06.2012

tdmSortedRFImport: negative importance values are now clipped to 0 (no longer additive shift of importance values).

If tdm$parallelCPUs>1: snowfall would fail, if there is only one pass through sfSapply, i.e. if  length(indVec)=1. Fix: Check in tdmCompleteEval whether length(indVec)==1, issue a warning and set tdm$parallelCPUs to 1.

Renamed bind_response to tdmBindResponse (tdmGeneralUtils.r)

Bug fix ‘path à tdm$path’ in tdmMapDesLoad (tdmMapDesign.r)

Bug fix for cma_es (package cmaes): When running demo/demo04cpu.r with tuner cmaes, we got “Error in eigen.log[iter, ] <- rev(sort(e$values)): subscript out of bounds". Solution: control$maxit = round(control$maxit), because this error only occurs if control$maxit is NOT an integer.

Fixed a bug concerning opts$filesuffix (tdmOptsDefaultsFill) which could lead to an unwanted stop.

Bug fix: regression tuning made strange things (names in data frame) if you tuned only 1 variable (cpu, roi with only XPERC). Now fixed.

Bug fix: cma_es (and other tdmStartOther-tuners) had usually in the BST data frame not the inclusion of the last design points (which usually are formed after the last time where “des$CONFIG %% tdm$spotConfig$seq.design.new.size==0” was TRUE). Now fixed.

Bug fix in tdmDispatchTuner: cma_jTuner did not yet return a list of type spotConfig in tunerVal. Consequence: the above “Append” would not work. Now fixed.

tdmDispatchTuner.r: Made all tuners return a list of type spotConfig with the proper settings in tunerVal$alg.currentResult and tunerVal$alg.currentBest.

Bug fix in tdmMapDesApply: the “[-1]” in “dn=setdiff(names(des[-1]), c("COUNT","CONFIG",...))” was wrong.

Extended tdmPlotResMeta by a slider y_10Exp, which allows to multiply the y-values by 100, 101,…,103 on the fly in the twiddler interface. This usually gives a better color scheme for the 3D-plot in spotReport3d.

 

V 0.3.0 – 08.05.2012

o   New ROC chart and lift chart capabilities, based on package ROCR on CRAN, see help(tdmROCR).

o   New measures for opts$rgain.type= “arROC”, “arLIFT”, “arPRE” for area under ROC, lift or precision-recall chart, based on package ROCR.

o   Improved and extended the set of demos (demo00, … , demo06). New demos for interactive visualization.

o   Improved cma_jTuner (CMA-ES, Java version). Works now on Linux and Windows OS platforms when using tdm$fileMode=FALSE.

o   Improved tdmPlotResMeta (confFile, nSkip, chkSkip, xAxis, yAxis).

o   Changed opts$fct.postproc: this is now the name of the postprocessing function  and not the function itself. Reason: If opts contains a function, then it contains also its environment and this can be pretty big (contains envT, …) and makes the .RData saving of envT big.

o   Flag opts$DO.POSTPROC is now deprecated, use instead opts$fct.postproc.

o   Fixed a bug concerning opts$filesuffix (tdmReadData, could lead to overwriting of opts$filename).

o   Improved the examples section in TDM-docu.html. Now most examples in TDM-docu.html are in sync with the set of demos. Seperated in TDM-docu.html the example usage description from the example details. New chapter describing the interactive visualization example.

o   Improved the package documentation (simpler index via @keywords internal, many small fixes).

 

V 0.2.1 – 15.03.2012

o   New 3D graphics for tuning results and their metamodels, using a twiddler-interface on environment envT: see help(tdmPlotResMeta).

o   New print() for TDMdata object “dataObj

o   Fixed a bug in tdmClassify (wrong ifelse in applySVM).

o   Fixed some minor bugs to reactivate parallel mode:  some sfExports were missing.

o   Fixed the saveEnvT-bug (“[9:9]”) in tdmCompleteEval. New option tdm$filenameEnvT.

o   Fixed the tdmMapDesign bug (Design variables missing in tdmMapDesign.csv and userMapDesign.csv would not be mapped to opts. Now missing variables are detected and an error is thrown.)

o   Added opts$SPLIT.SEED variable: a variable to decide if tdmSplitTestData runs in deterministic mode 

o   Added opts$TST.trnFrac: now trnFrac can be smaller than 1-opts$TST.valiFrac.

o   Added SAVESEED-part in tdmSplitTestData, tdmClassifyLoop, tdmRegressLoop

o   Added tdm$stratified with new meaning: if not NULL, make stratified sampling w.r.t. the column of dset named in tdm$stratified.

o   Some minor fixes concerning data reading

o   TDMR documentation now available in PDF and HTML format (TDM-docu.html)

 

V 0.2.0 – 06.02.2012

o   integration of SFA (slow feature analysis, see package rSFA on CRAN) as a feature generation method for classification

o  bug fix concerning tdmMapDesign; extension of tdmMapDesign.csv

o   moved PCA feature generation from main_* into tdmClassifyLoop, it uses now only the training data for establishing the PCA rotation (same for SFA)

o   new training / validation / test set capabilities, see Section “TDMR Data Reading and Data Split …” in TDM-docu.html and help(tdmSplitTestData), help(tdmReadData).

o   modified TDMR’s seed concept, new option opts$*.seed = “algSeed” (get the seed from spotConfig$alg.seed)

o   new parameter tdm$mainFunc, simpler and more general usage (as compared to tdm$mainFile and tdm$mainCommand)

o  powell, cmaes, rSFA now in the “Depends” list of DESCRIPTION

o  added a TDMR-package description (file tdmGeneralUtils.r)

 

V 0.1.3 – 04.01.2012

o   extended documentation (e.g. full docu for tdmOptsDefaultsSet and  many small other documentation extensions)

o   new section opts$CLS.* for classification-related settings

o   bug fixes in demo01cpu (seed variation) and demo02sonar (GD.DEVICE)

o   merged former functions unbiasedBestRun_C and unbiasedBestRun_R into only one function unbiasedRun

o   extended functions for information on class objects: print.TDMclassifier, print.tdmClass, print.TDMregressor, print.tdmRegre              

o   removed the dependencies on packages matlab and mlbench

V 0.1.2 – 05.12.2011

o   new function tdmParaBootstrap.r: add parametric bootstrap patterns, if opts$ncopies>0

o   new version of TDM-docu.pdf: see documentation index – directory

o   new demo: demo00sonar (with some graphics)

o   fix in print.TDMclassifier, print.TDMregressor: optional argument ‘type’

o   doc/index.html added

o   doc/changes.html added (this file)

V 0.1  – 10.11.2011

o   initial version