# This file contains all tips which can be shown during startup.
# Comment lines must start with '#'.
# Tips are separated with empty lines. It is not allowed to use empty lines 
#      in a tip. Please use HTML linebreaks.
<p>
Many general settings can be defined in the settings dialog:<br>
<i>Tools</i> --&gt; <i>Preferences</i>.
</p>

<p>
Some people prefer to perform their experiments always in the same way, even if the
experiment contains randomized heuristics. This ensures that the results can
be reproduced. Therefore, the experiment root operator provides a parameter
&quot;random_seed&quot; which determines the sequence of random numbers. A
random seed of -1 means that no seed is used at all and the experiments are fully
randomized. The default random seed can be defined in the settings dialog: <br>
<i>Tools</i> --&gt; <i>Preferences</i>.
</p>

<p>
The different tabs provide two experiment editors (tree view and XML editor),
another experiment viewer (box view), a result viewer, and a system
monitor. All intermediate and end results are displayed in the result tab.
</p>

<p>
You can add a new operator from the context menu of the selected operator
(right click) in the tree view. Please note that this option is only available
for operator chains, i.e. operators which are able to contain inner
operators.
</p>

<p>
Another way of operator adding is the &quot;Insert Operator&quot; dialog which
is available via several icons. This dialog provides several search schemes to
find an operator which suits your needs.
</p>

<p>
Instead of adding a new operator you can also replace the selected operator by
another one. This is only possible in the operator context menu in the tree
view (right click).
</p>

<p>
Yale contains all Weka learners and attribute weighting schemes. They can
simply be used like other Yale operators. Please note that Weka meta learners
must contain another Weka learner. For the documentation of the Weka operators
and parameters please refer to the Weka Javadoc documentation.
</p>

<p>
The operator <i>FeatureGeneration</i> can construct new features from
numerical data. All functions supported by Java are available. The format is
prefix, for example &quot;sin(+(att1,att2))&quot;. Please refer to the
documentation of this operator in the Yale tutorial.
</p>

<p>
Yale distinguishes between two types of attributes: regular attributes used by
learning schemes and special attributes. The latter can have arbitrary names and
are required by some operators. For example, a learning scheme needs at least a &quot;Label&quot;
attribute and clusterers usually need an &quot;Id&quot;.
</p>

<p>
Yale can easily be extended by plugins. These contain additional operators for
some special learning tasks. For now, plugins for Value Series learning,
Clustering, and Word Vector creation are provided. Please check the Yale
website for new plugins: <br>
<i>http://yale.cs.uni-dortmund.de</i>
</p>

<p>
Complex experiments sometimes need to write data or other results into a file
and reload it later. Therefore, the %-expansion in parameter values was added:
<ul>
<li>%a is replaced by the number of times the operator was applied</li>
<li>%t is replaced by the current system time</li>
<li>%n is replaced by the name of operator</li>
<li>%c is replaced by the class of operator</li>
</ul>
Please refer to the Yale tutorial for a full description of all expansions.
</p>

<p>
To estimate the performance of a learning scheme several validation schemes
are provided:
<ul>
<li><i>XValidation</i>: A k-fold cross validation divides the data into k
parts and uses k-1 parts for learning and the remaining part for estimating the
performance. This is iterated for all parts and the average is
built. For classification tasks, a stratified cross validation can
also be applied.</li>
<li><i>Leave-one-out</i>: Basically a cross validation with k = number of examples.</li>
<li><i>SimpleValidation</i>: A SimpleValidation randomly splits up the example set into a training and test set and evaluates the model.</li>
<li><i>FixedSplitValidation</i>: A FixedSplitValidation splits up the example set at a fixed point into a training and test set and evaluates the model.</li>
</ul>
Similar validation operators are available for the evaluation of
feature operators.
</p>

<p>
To estimate the performance of feature selection and construction wrappers,
another cross validation around the wrapper must be used. Although it is
possible to nest several cross validations, sometimes it might be more
convenient to use one of the following operators:
<ul>
<li><i>WrapperXValidation</i>: Encapsulates a cross-validation experiment to
evaluate a feature weighting or selection method (wrapper).</li>
<li><i>SimpleWrapperValidation</i>: A simple validation method to check the performance of a feature weighting or selection wrapper.</li>
</ul>
These operators take care that the weighting / selection is applied on the
independent test set.
</p>

<p>
Only one operator is used for all performance calculations:
<i>PerformanceEvaluator</i>. This operator provides many performance criteria
including accuracy, precision, and recall for classification tasks and several
error measures for regression tasks. Please make sure that you select the correct
parameters for the task at hand.
</p>

<p>
Attribute selection (feature selection) is seen as attribute weighting
which allows more flexible operators. Feature operators like forward
selection, genetic algorithms and the weighting operators can now deliver an
example set with the selection / weighting already applied or the original
example set (optional). Therefore, all feature operators deliver the IO
object <i>AttributeWeights</i>, not only the weighting ones. A weight of 0
means, that the attribute should be deselected.
</p>

<p>
The preprocessing operators in Yale include
<ul>
<li><i>Discretization</i>: discretize numerical attributes into a user defined
number of bins</li> 
<li><i>Example filter</i>: allows only examples which fulfill a specified
condition</li> 
<li><i>Feature filters</i>: deselects features not fulfilling a given
condition</li> 
<li><i>Feature selection, construction, and weighting</i>: many feature
operators including (evolutionary) wrappers and weighting filters</li> 
<li><i>Value replenishment</i>: replaces infinite or missing values</li>
<li><i>Sampling</i>: several schemes to sample a subset of the examples</li>
<li><i>Normalization and standardization</i>: normalize into given intervals
or standardize data</li>
<li><i>Changing the value type of the attributes</i>: maps all values to real
values, nominal to binary values, etc.</li>
<li><i>IdTagging</i>: adds an Id to the examples</li>
</ul>
among others.
</p>

<p>
In addition to the built-in Yale plotters for data and statistics
another visualization operator was added: <i>JViToPlotter</i>. This operator
tries to display its input and provides a large set of 2D and 3D plotters for
high-dimensional data.
</p>

<p>
One of the most usable operators for experiment evaluation is the
<i>ExperimentLogOperator</i> which is able to record almost arbitrary
data. The collected data can directly be plotted in the graphical user
interface (online plotting). This is often the best way to check if the
experiment seems to produce good results.
</p>

<p>
Each time the <i>ExperimentLogOperatory</i> is applied, all the values and
parameters specified by the list <var>log</var> are collected and
stored in a data row. Therefore this operator should be placed at a
position where it is able to collect the necessary data.
<br>
Since using this operator is somewhat tricky please refer to the
documentation and the experiments in the sample directory of Yale.
</p>

<p>
Since massive logging may slow down experiments the default log
verbosity for new experiments is &quot;init&quot;. The log verbosity
can be defined as a parameter of the root experiment operator. The
higher the parameter was set the less messages are logged. The
possible settings are:
<ul>
<li>MINIMUM: logs everything</li>
<li>IO: shows all input and output of each operator</li>
<li>STATUS: displays operator messages</li>
<li>INIT: displays initialization infos and end results</li>
<li>WARNING: displays all warnings</li>
<li>EXCEPTION: displays all exceptions</li>
<li>ERROR: displays all errors</li>
<li>FATAL: displays all fatal errors which will definitely stop the
experiment</li>
<li>MAXIMUM: logs almost nothing</li>
</ul>
</p>

<p>
Please ensure that the correct performance criteria for the task at hand were
defined. For example, criteria like accuracy or precision are suitable for
classification tasks, criteria like absolute or squared error are suitable for
regression tasks.
</p>

<p>
Examples with Id can now be displayed from the plotter by double
clicking the example in the plot. Therefore a <i>ExampleVisualization</i>
operator must have been added. Plugins may add more appropriate example
visualizers, for example music players for audio data or image / text
viewers.
</p>

<p>
All 2D plotters provide zooming functionality. Simply drag a rectangle to
zoom into the selected region. Right clicking sets the range back to
maximum size.
</p>

<p>
User descriptions (comments) can be added and edited in the operator info
screen (F1). The description of the root operator is shown in a dialog after
loading the experiment. This feature can be disabled in the settings dialog.
</p>

<p>
Yale provides two different modes for experts and beginners. In expert
mode all parameters are shown. In the beginner mode only the most important
parameters.
</p>

<p>
You can save your experiment as Template. Experiments which were saved
as template can be used by the wizard. This allows quick experiment setup for
similar experiments.
</p>

<p>
All operators, parameters, and GUI elements provide useful information
as tool tip text. Point the mouse cursor a few moments at an element to 
display a short description.
</p>

<p>
An operator information including a short description is displayed in the
operator info dialog. This dialog also contains error descriptions in cases
the experiment validation found an error. The dialog for the selected operator
can be opened with F1 or from the context menu (right click).
</p>

<p>
Experiments should be validated before they are started (via the icon in the
tool bar, the Tools menu, or F4). Errors are marked with an exclamation mark
in the operator tree. A short error description can also be found in the
operator info screen (F1).
</p>

<p>
Most data input operators support a parameter &quot;datamanagement&quot;.
Usually all examples are encoded as numerical arrays which is a fast
way of data management.
For sparse data sets, i.e. where examples contain many attributes with a 
default value is might be more efficient to use one of the sparse
data management types. We recommend 'sparse_array' in these cases. It uses
less memory and is still very efficient.
</p>

<p>
Yale usually uses only 128 Mb of main memory even if your computer provides
more memory. For big data sets or complex experiments this amount of memory
might not be sufficient. To increase the amount of memory which can be used by
Yale simply set the value of the environment variable MAX_JAVA_MEMORY, for
example <br>
MAX_JAVA_MEMORY=512<br>
You can set the environment variable either in your system settings or in the
start script of Yale. <br>
<b>Important:</b> Increasing the amount of memory for Yale only works if Yale
was started with one of the start scripts and, for example, not for double
clicking the Jar file! If you want to increase Java memory in general please
use the &quot;-Xmx&quot; option of your Java interpreter.
</p>

<p>
For classification tasks the prediction confidences are automatically set for
all classes. Most learners directly provide correct confidence values but in
some cases it is necessary to define a boolean parameter to cause the correct
confidence values.
</p>

<p>
Yale supports different types of meta data. Each attribute at least has a name
and a value type (e.g. numerical or nominal). Further meta data define if an
attribute is a singular value or part of a value series. Even units can be
defined which might be a support for automatic feature construction
approaches.
</p>

<p>
Example sets can be displayed in three different ways: the meta data view, the
data view, and the plot view. In meta data view each line describes an
attribute and some basic statistics for this attribute. In data view each line
is an example containing all attribute values. The plot view can be used to
display different plots in the selected dimensions.
</p>

<p>
Yale provides some built-in plotters for data and results: 2D color plots,
histograms or distribution plots, and scatter plots. The <i>JViToOperator</i>
adds several 2D and 3D plots for your data and models. If GnuPlot is installed
it is also possible to display online 3D plots.
</p>

<p>
One of the main features of Yale is it's ability to arbitrarily nest operator
chains and build complex operator trees. This allows for example the
optimization of a feature set and the parameters at the same time. You can
simply exchange single operators to evaluate how each operator performs on
your task. The rest of the experiment remains the same.
</p>

<p>
You should start with the online tutorial (help menu) to learn about some
basic concepts of Yale.
</p>

<p>
The standard data file format of Yale is a set of data files together with a
meta data description in XML. The format of the data files is very
flexible and data can be merged from several files. The input operator
<i>ExampleSource</i> can be configured to allow arbitrary column
separators (via regular expressions), quote characters, and comment
characters.
</p>

<p>
The XML attribute meta data description can be created with help of the
Attribute Editor. Please click on the &quot;Edit&quot; button of the parameter
&quot;attributes&quot; of the operator <i>ExampleSource</i>. This opens a
dialog for loading data and setting the attribute meta data. The attribute
description XML file can be saved and is automatically loaded by the
<i>ExampleSource</i> operator.
</p>

<p>
Almost all objects which can be passed between operators can be saved into
and loaded from files. Please check the &quot;IO&quot; group in the operator
menus.
</p>

<p>
Yale contains many data input and output operators. 
Data can be loaded from several file formats including ARFF, sparse, csv,
dBase, C4.5, and BibTeX. It can also be loaded directly from a database.
</p>

<p>
Data can be written in almost arbitrarily formats using the
<i>ExampleSetWriter</i> operator. A writer for ARFF files is also provided.
</p>

<p>
A special operator <i>IOConsumer</i> exists to consume output objects which
are not used any longer. This allows even more complex experiments.
</p>

<p>
Meta optimization schemes like parameter optimization operators usually need
children which are to be optimized. They must provide a performance measure 
(wrapper approach).
</p>

<p>
The <i>IteratingOperatorChain</i> can be used to perform the action of the
inner operators <i>n</i> times.
</p>

<p>
In contrast to the parameter optimization operators the operator
<i>ParameterIterator</i> just iterates through a given set of parameters
without performing any search for an optimal parameter set. This might be
useful in cases where performance or other characteristics should be
plotted against a parameter, e.g. inter cluster density against the parameter
<i>k</i> of k-Means clustering.
</p>

<p>
<i>Discretization</i> operators discretize numerical attributes into a user
defined number of bins.
</p>

<p>
Example filter operators allow only examples which fulfill a specified
condition.
</p>

<p>
Feature filter operators deselect features not fulfilling a given condition.
</p>

<p>
Feature selection, construction, and weighting operators are a frequently used
way to greatly improve learning accuracy. Yale contains many feature operators
including (evolutionary) wrappers and weighting filters.
</p>

<p>
If a set of feature weights was once built you can use the operator
<i>AttributeWeightSelection</i> to deselect all features which weight do not
fulfill a given relation. For example, you can decide to keep only those
features with a weight greater than 0.5.
</p>

<p>
Value replenishment operators replace infinite or missing values.
</p>

<p>
Yale provides several schemes to sample a subset of the complete example set.
</p>

<p>
<i>Normalization</i> and <i>standardization</i> normalize into given intervals
or standardize data to zero mean and standard deviation 1 (z-transform).
</p>

<p>
You can easily change the value types of attributes by the corresponding
preprocessing operators.
</p>

<p>
One of the main features of Yale is the multi-layered view concept. This
allows the almost arbitrary nesting of complex operator chains while keeping
an efficient data handling. Data is never copied (e.g. for cross validation or
feature selection) but only different views are used on the same data
table. Please check the sample directory for a small excerpt of all possible
experiments.
</p>

<p>
Yale experiments are described in an XML format which can be used as a
scripting language for data mining experiments. You can see Yale as a
Interpreter for this data mining scripting language and use it from your own
application.
</p>

<p>
Usually only one operator must be changed to compare two algorithms. The rest
of the experiment remains the same.
</p>
