Major update Oct 1-11, 2014 (RZ). Modifications summarized in "CHANGES"

file.

Major update Oct 1-11, 2014 (RZ). Modifications summarized in "CHANGES"
file.
da7cf576 · Richard Zanibbi · f1414bff · da7cf576 · da7cf576 · da7cf576
--- a/CHANGES
+++ b/CHANGES
+CHANGES File
+
+Oct. 1-9, 2014
+R. Zanibbi
+
+Major work on the library recently, including:
+
+	- Changed metric output format for evallg.py, adding information for
+	  detection f-measures; structure, object and relation detection and
+	  detection + classification rates; relative correc. class/detection rates
+	
+	- Modified file outputs for evaluate script:
+		- Now creating a human-readable spreadsheet (.csv) will all raw metrics,
+		  to faciliate analysis in spreadsheet programs, statistical packages, etc.
+		- Producing list of processed files with Correct/Incorrect indication
+		- Separate directory for .dot graphs for files with errors
+		- Confusion matrix file now has a separate plot for just relationship vs.
+		  relationship confusions on edges, along with the 'full' edge 
+		  label confusion matrix. 
+
+		  * Significant effort to clarify correct merge but wrong class vs.
+		    segmentation disagreements in the Summary.txt and ConfusionMatrix.html
+			files (in Summary, for directed and undirected edges).
+
+	- Created scripts to automate relabeling 'old' N/E format CROHME files
+	  using '*' for merge edges (relabelEdges and relabelOldCROHME in bin/)
+
+	- Debugging of metric computations
+
+		- D_S (merge edge rate) was incorrect when segments are defined
+		  inconsistently (e.g. only 'merge' in one direction, vs. both).
+		  Corrected.
+
+        - Removed computation of percentage metrics to reduce raw result
+		  spreadsheet. These are easily computed, along with descriptive
+		  statistics and distributions in standard spreadsheets/statistical
+		  packages.
+
+		- Various small errors related to when files are empty, have missing
+		  primitives, or have additional primitives within them.
+	
+	- lg.csvObject() function will write out lg data in the new O/R format,
+	  with a brief explanation of the format in the file.
+
+	- Final touches on debugging/refining lg2dot, which I modified significantly
+	  in early September to produce more readable output that is more consistent
+	  across the different graph types.
+
+	- Created file summarizing 'raw' metric data types and key data structures
+	  in lg.py (which tend to be hard to remember).
+
+	- **Added ability to use 'evaluate' or 'confHist' with either a pair
+	  of directories, or a list of files.
+
+	- Added lgOR and lgNE converters, so that files can be pretty printed
+	  and/or converted easily between formats. Objects are still identified
+	  by common labels on edges and nodes (i.e. '*' for merging primitives
+	  is deprecated, although still accepted at input time).
+	
+	- Added cliff, ldiff and vdiff tools for finding instances/file with common 
+          errors using regular expression matching over .diff files produced by the
+	  'evaluate' script.
+
+    - *Debugged lg2txt.py, which was not working properly with some labels
+	  (in particular, for square roots using 'Inside' relationships). Also
+	  updated the translate/mathMLMap.csv file accordingly.
+
+	- Added ability to export a file list from structure confusion histogram output.
--- a/LICENSE
+++ b/LICENSE
 License and Copyright

-The LgEval evaluation toolkit is Copyright (c) 01/06/2013, Richard Zanibbi
+The LgEval evaluation toolkit is Copyright (c) 01/06/2013, 01/10/2014 Richard Zanibbi
 and Harold Mouchère. LgEval is free software; you may redistribute it and/or
 modify it under the terms of the Creative Commons CC BY-NC-SA 3.0
 (Attribution-NonCommercial-ShareAlike 3.0 Unported)

--- a/README.html
+++ b/README.html
--- a/README.md
+++ b/README.md
--- a/README_MetricsData.txt
+++ b/README_MetricsData.txt
+-------------------------------------------------------------------
+  README: Summary of Metrics and Key Data Structures
+  LgEval 0.3.2
+
+  First Version: Oct. 10, 2014 (R. Zanibbi)
+  Copyright (c) 2014 Richard Zanibbi and Harold Mouchere
+-------------------------------------------------------------------
+
+LgEval uses a simple file-based approach to evaluating individual files and
+collections of files.
+
+The program src/evallg.py provides the main functions from which metrics are
+computed for each input file, producing a metrics file (<infile>.csv) and a
+file specific differences between the input and its target (<infile>.diff,
+another .csv file).
+
+When using the 'evaluate' script, metrics and differences for individual files
+are written to a <Results_Dir>/Metrics directory. These individual results are
+concatenated and then used to produce the raw results spreadsheet and summary
+files output by the 'evaluate' script.
+
+:: Adding Metrics ::
+
+Metrics may be added by modifying the lists of named metrics in the functions
+lg.compare() and lg.compareSegments() in src/lg.py. Once added to one of these
+lists, they will automatically be computed and compiled when using the
+'evaluate' script.
+
+
+DIFF FILE FORMAT (.diff)
+---------------------------
+
+.diff files are in CSV format, representing one pair of disagreeing labels per 
+line.
+
+Node label errors:
+
+	*N, Primitive ID, OutputLabel, OutputWeight, :vs:, TargetLabel, TargetWeight
+
+Edge label errors (single line in .diff file):
+
+	*E, Primitive ID (parent), Primitive ID (child), OutputLabel, OutputWeight, :vs:,
+		TargetLabel, TargetWeight
+
+Segmentation errors showing primitive pair where output and target disagree on 
+whether the two primitives belong to the same object:
+
+	*S, PrimitiveID (parent), PrimitiveID (child)
+
+NOTE: because primitive merges are represented by bidirectional edges, normally
+if *S,1,2 appears in a .diff file, *S,2,1 will also appear in the file.
+
+
+METRIC DESCRIPTIONS
+--------------------------------------------------
+
+Metrics have been organized into groups below. Please note that they currently
+do not appear in this order in 00_RawResults.csv.
+
+Targets
+---------
+nNodes                  Number nodes in ground truth (comparison) file
+nEdges                  Number edges in ground truth (comparison) file
+nSeg                    Number objects (segments) in ground truth (comparison)
+nSegRelEdges            Number object relationships in ground truth (comparison) 
+
+Detections
+-----------
+detectedSeg             Number objects detected in input file
+dSegRelEdges            Number object relationships in input file
+
+Primitive Metrics
+-------------------
+D_B                     Number of incorrect node and edge labels
+D_C                     Number of incorrect node/primitive labels
+D_L                     Number of incorrect edge labels
+D_R                     Number of incorrect relationship edges
+D_S                     Number of incorrect 'merge/object' edges 
+D_E(%)                  Weighted accuracy over node and edge labels 
+                        (see DRR 2013 paper)   
+nodeCorrect             Number primitives (nodes) correctly labeled
+
+segPairErrors           Number incorrectly merged/split primitive pairs 
+edgeDiffClassCount      Number valid directed merge edges with incorrect object type  
+undirDiffClassCount     Number valid undirected merge edges with incorrect object type
+dPairs                  Number incorrect *undirected* node pairs
+
+
+Object Metrics
+------------------
+CorrectSegRelLocations      Number correct relationship locations
+CorrectSegRels              Number correctly located and labeled relationships 
+CorrectSegments             Number correctly segmented objects
+CorrectSegmentsAndClass     Number correctly segmented and labeled objects
+ClassError                  Number incorrectly classified objects
+SegRelErrors                Number incorrectly detected segment relationships
+
+Flags (1/0)
+-------------
+hasCorrectSegments          Object locations are correct
+hasCorrectSegLab            Object locations and labels correct
+hasCorrectRelationLocations Object relationship locations correct
+hasCorrectRelLab            Object relationship locations correct
+hasCorrectStructure         Object *and* relationship locations correct
+
+
+
+
+LGEVAL KEY DATA STRUCTURES
+------------------------------------
+
+Label Graph Attributes (see lg.py)
+-----------------------------------
+lg.error            Error flag
+lg.cmpNodes         Function used to compare node labelings 
+                    (default: Hamming distance, #disagreeing labels)
+lg.cmpEdges         Function used to compare edges labelings
+                    (default: Hamming distance)
+
+lg.nlabels          Dictionary from node (primitive) identifiers to
+                    another dictionary mapping labels to confidence values
+                    (floating point values)
+lg.elabels          Dictionary from primitive pairs (edges) to
+                    another dictionary mapping labels to confidence values
+                    (floating point values)
+lg.absentNodes      Set of identifiers not present in the lg relative
+                    to another lg (e.g. relative to ground truth)
+
+
+Graph Segment Output (output of lg.segmentGraph())
+----------------------------------------------------
+segmentPrimitiveMap    Dictionary from object identifier to a pair (a,b)
+                       a: list of labels;  b: list of primitives in the object
+primitiveSegmentMap    Dictionary from primitive identifier to 
+                       another dictionary mapping a label to the object id
+                       associated with the primitive for this relationship type
+rootSegments           Set of objects with no incoming edges
+segmentEdges           Dictionary from pairs of object identifiers to
+                       another dictionary mapping relationship labels to 
+                       confidence values
+
+'compareSegments' output (lg.compareSegments())
+------------------------------------------------
+edgeDiffCount       Number of disagreeing 'merge/object' edges        
+segDiffs            Dictionary mapping primitives to pairs of
+                    primitives belonging to objects ( diff1, diff2 )
+correctSegments     Set of (Obj id, label) pairs for correct segments
+metrics             List of metric (name, value) pairs - these are a
+                    subset of the metrics described above
+primRelEdgeDiffs    Dictionary mapping primitive pairs (edges) to an
+                    error entry (should probably be simplified)
+
+'compare' output (lg.compare())
+---------------------------------
+metrics             All metrics described above
+nodeconflicts       List of pairs (node id, [ (l1, 1.0), (l2, 1.0) ])
+                    where l1 and l2 are disagreeing labels
+edgeconflicts       List of pairs ( (nid_1, nid_2, [ (l1, 1.0), (l2, 1.0) ])
+                    where l1 and l2 are disagreeing labels
+segDiffs            Produced by compareSegments() (see above)
+correctSegs         Produced by compareSegments() (see above)
+segRelDiffs         Same as primRelEdgeDiffs from compareSegments() 
+                    (see above)
+
--- a/bin/cdiff
+++ b/bin/cdiff
+#!/bin/bash
+
+if [ $# -lt 3 ]
+then
+	echo "LgEval cdiff: Compile Errors from Diff Files"
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2014"
+	echo ""
+	echo "Usage: cdiff [-NESC^] outputPattern targetPattern <files>"
+	echo ""
+	echo "Return all lines (instances) of errors in diff files for"
+	echo "nodes or edges matching the provided patterns (egrep-format"
+	echo "regular expressions). Matching lines are written on standard output."
+	echo ""
+	echo "*Note: the pattern 'any' will match any label."
+	echo ""
+	echo "A single flag list token indicates whether to limit matches to"
+	echo "(N)ode label errors, (E)dge label errors, and/or files with"
+	echo "(S)egmentation errors or only (C)orrect segmentations. Including"
+	echo "^ in the flag list token will return files that do not match the"
+	echo "passed patterns."
+	exit 0
+fi
+
+# By default, return lines matching the given patterns.
+FLIST=""
+
+# Create initial list of all .diff files.
+CFILES=""
+FLAGS=""
+
+if [[ $1 == -* ]]
+then
+	# Grab flag string and shift.
+	FLAGS=$1
+	shift
+fi
+
+# Grab the patterns; shift to file list.
+OUTP=$1
+TARP=$2
+shift
+shift
+
+# Take CFILES (current files) as all passed .diff files.
+CFILES="$@"
+
+# Indicate if we want the complement of a match.
+if [[ $FLAGS == *^* ]]
+then
+	FLIST="-v"
+fi
+
+# Note that segment error/correct seg are exclusive.
+if [[ $FLAGS == *S* ]]
+then
+	CFILES=`grep -l "^*S" $@`
+elif [[ $FLAGS == *C* ]]
+then	
+	# -L flag selects inverse of the pattern ('negates' it)
+	CFILES=`grep -L "^*S" $@`
+fi
+
+# One or more non-comma characters: reg expression for 'any label' (*)
+# Create the pattern string to use with grep, then run the filter and
+# obtain the list of matching files.
+ANYLABEL="[^,][^,]*"
+MID=",1.0,:vs:,"
+
+OUTLABEL=$ANYLABEL
+TARLABEL=$ANYLABEL
+if [ "$OUTP" != "any" ]
+then
+	OUTLABEL="$OUTP"
+fi
+
+if [ "$TARP" != "any" ]
+then
+	TARLABEL="$TARP"
+fi
+
+PATTERN="$OUTLABEL$MID$TARLABEL"
+
+# Note that Node/Edge filtering is also exclusive.
+if [[ $FLAGS == *N* ]]
+then
+	PATTERN="^\*N.*$PATTERN"
+elif [[ $FLAGS == *E* ]]
+then	
+	PATTERN="^\*E.*$PATTERN"
+fi
+
+# Use extended regular expressions to ease usage.
+# FLIST allows the complement to be returned if desired.
+CFILES=`grep $FLIST -E "$PATTERN" $CFILES`
+
+# Write the matching file names on standard output.
+for file in $CFILES
+do
+	echo `basename $file`
+done
+	
--- a/bin/confHist
+++ b/bin/confHist
+#!/bin/bash
+
+if [ $# -lt 1 ]
+then
+	echo "LgEval confHist: Structure Confusion Histogram Generator"
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2013-2014"
+	echo ""
+	echo "Usage: confHist dir1 dir2 [minCount] [strokes] OR"
+	echo "       confHist fileList [minCount] [strokes]"
+	echo ""
+	echo "Creates an .html file containing structure confusion histograms"
+	echo "at the object level. The histograms visualize errors by their"
+	echo "frequency when comparing files in dir1 vs. dir2 (dir2 is 'ground truth')."
+	echo ""
+	echo "If a file list is provided, then each line of the file"
+	echo "(where each line is: 'inputfile outputfile') is used for comparisons."
+	echo ""
+	echo "minCount is the minimum number of times an error should occur before"
+	echo "detailed information is provided in the confusion histogram. By default,"
+	echo "all errors are shown (minCount = 1)."
+	echo ""
+	echo "If an optional argument is provided (<strokes>), then stroke"
+	echo "confusion histograms will be constructed in addition to object"
+	echo "confusion histograms."
+	echo ""
+	echo "Output is written to the file CH_<dir1>.html or"
+	echo "CH_<fileList>.html, depending upon the arguments used."
+	exit 0
+fi
+
+if [ -d $1 ]
+then
+	# Two directories passed (hopefully).
+	# NOTE: Assumes same number of .lg files.
+	ls $1/*.lg > _f1
+	ls $2/*.lg > _f2
+	paste -d" " _f1 _f2 > d_$1
+	rm -f _f1 _f2 
+	
+	INFILE=d_$1
+	shift
+	python $LgEvalDir/src/confHists.py $INFILE $@
+	rm d_$1
+else
+	# User-provided file list.
+	python $LgEvalDir/src/confHists.py $@
+fi
+
+exit 0
--- a/bin/crohme2lg
+++ b/bin/crohme2lg
--- a/bin/evaluate
+++ b/bin/evaluate
@@ -10,103 +10,189 @@
 # in your .bashrc file (the initialization file for bash shell). The PATH
 # alteration will add the tools to your search path. 

-if [ $# -lt 2 ]
+if [ $# -lt 1 ]
 then
-	echo "LgEval Label graph evaluation tool"
-	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2013"
+	echo "LgEval evaluate: Label graph evaluation tool"
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2014"
 	echo ""
-	echo "Usage: evaluate outputDir groundTruthDir [t/d/s/b]"
+	echo "Usage: evaluate outputDir groundTruthDir [p/t/d/s/b] OR"
+	echo "       evaluate fileList [p/t/d/s/b]"
 	echo ""
 	echo "Evaluates all label graph (.lg) files in outputDir against"
 	echo "corresponding files in groundTruthDir. groundTruthDir is used"
 	echo "to generate the list of files to be compared (i.e. if a file is"
 	echo "not in the ground truth directory, it will not be considered)."
 	echo ""
+	echo "If a list of files is provided instead ('output target' on each line)"
+	echo "then these file pairs are used for evaluation."
+	echo ""
 	echo "Outputs"
 	echo "-----------------------------"
-	echo " Results<outputDir>/"
-	echo "    Summary : summary of performance metrics"
-	echo "    Correct : list outputDir files matching ground truth"
-	echo "    Metrics.m : metrics for all .lg files compared"
-	echo "    Diffs.diff : all differences between files"
-	echo "    ConfusionMatrix.html : node and edge label confusion matrix"
-	echo "        (errors only)"
+	echo " Results<outputDir/fileListName>/"
+	echo "    00_RawResults.csv:      raw metrics spreadsheet"
+	echo "    00_Summary.txt:         summary of performance metrics"
+	echo "    ConfusionMatrices.csv/.html:  confusion matrix spreadsheet (errors)"
+	echo "    FileResults.csv: all evaluated files, with correct/incorrect indication"
 	echo "" 
-	echo "    Metrics/ : directory with .m (metric) and .diff (difference) file for"
-	echo "      each comparison. .dot (GraphViz) and .pdf files are generated for"
-	echo "      viewing differences between files if a third argument is provided."
+	echo "    Metrics/: directory with .csv (metric) and .diff (difference) files"
+	echo "    graphErrors/: if dot output requested, visualizations for files with"
+	echo "      errors are stored here (.dot and .pdf format)."
 	echo ""
 	echo "NOTE: the different visualizations of structural differences are described"
 	echo "      if you run lg2dot without arguments (object (t)ree; (d)irected graph"
 	echo "      over objects; primitive (s)egmentation graph; (b)ipartite graph over"
-	echo "      primitives."
+	echo "      primitives; (p): default directed graph over primitives."
 	exit 0
 fi

-dir=$1
+DOTARG=""
 BNAME=`basename $1`
-truthDir=$2
-ResultsDir=Results_$BNAME
+MODE="Dir"
+
+TARGETS=""
+OUTPUTS=""
+if ! [ -d $1 ]
+then
+	echo "File List: $1"
+	MODE="List"
+	# Get the targets
+	OUTPUTS=`awk '{ print $1; }' $1`
+	OUTARR=($OUTPUTS)
+	TARGETS=`awk '{ print $2; }' $1`
+
+	if [ $# -gt 1 ]
+	then
+		DOTARG=$2
+	fi
+else
+	echo "Output File Directory:  $1"
+	echo "Ground Truth Directory: $2"

+	TARGETS=`ls $2/*.lg`
+
+	if [ $# -gt 2 ]
+	then
+		DOTARG=$3
+	fi
+fi
+echo ""
+
+ResultsDir=Results_$BNAME
 if ! [ -d $ResultsDir ]
 then
 	mkdir $ResultsDir
 	mkdir $ResultsDir/Metrics
+
+	if [ "$DOTARG" != "" ]
+	then
+		mkdir $ResultsDir/errorGraphs
+		mkdir $ResultsDir/errorGraphs/dot
+		mkdir $ResultsDir/errorGraphs/pdf
+	fi
 fi

-echo "Output File Directory:  $1"
-echo "Ground Truth Directory: $2"
-echo ""

-# Compute all .m metrics outputs (per-file), and .diff results (per-file).
+# Compute all .csv metrics outputs (per-file), and .diff results (per-file).
 echo "Evaluating files..."
-PREFIX=res_
-for file in `ls $truthDir/*.lg`
+
+INDEX=0
+for file in $TARGETS
 do
 	FNAME=`basename $file .lg`
-	nextFile=$dir/$FNAME.lg
-	if  [[ ! -e $ResultsDir/Metrics/$FNAME.m ]]
+	nextFile="_ERROR_"
+	if [ $MODE == "Dir" ]
+	then
+		nextFile=`echo "$1/$FNAME.lg" | perl -p -i -e "s/\/\//\//g"`
+	else
+		# Index to the next input file.
+		nextFile=${OUTARR[INDEX]}
+	fi
+
+	if  [[ ! -e $ResultsDir/Metrics/$FNAME.csv ]]
 	then
 		# NOTE: the script convertCrohmeLg can be used to convert
 		#       crohme .inkml files to .lg files.
 		echo "  >> Comparing $FNAME.lg"

-		python $LgEvalDir/src/evallg.py $nextFile $file m > $ResultsDir/Metrics/$FNAME.m
-		DIFF=`python $LgEvalDir/src/evallg.py $nextFile $file diff`
+		python $LgEvalDir/src/evallg.py $nextFile $file m INTER > $ResultsDir/Metrics/$FNAME.csv
+		DIFF=`python $LgEvalDir/src/evallg.py $nextFile $file diff INTER`
+		CORRECT="Correct"
 		if [ -n "$DIFF" ]
 		then
 			echo "$DIFF" > $ResultsDir/Metrics/$FNAME.diff 
+			# Record files with errors in the "Errors" file.
+			#echo "$nextFile" >> $ResultsDir/FilesIncorrect.txt

 			# If a third argument is provided, generate a .pdf file to visualize
 			# differences between graphs.
-			if [ $# -gt 2 ]
+			if [ "$DOTARG" != "" ]
 			then
-				lg2dot $nextFile $file $3 
-				mv $FNAME.dot $FNAME.pdf $ResultsDir/Metrics/
+				if [ "$DOTARG" == "p" ]
+				then
+					lg2dot $nextFile $file
+				else
+					lg2dot $nextFile $file "$DOTARG"
+				fi
+				mv $FNAME.dot $ResultsDir/errorGraphs/dot
+				mv $FNAME.pdf $ResultsDir/errorGraphs/pdf
 			fi
+			
+			CORRECT="Incorrect"
 		else
 			rm -f $ResultsDir/Metrics/$FNAME.diff
-			echo "$nextFile" >> $ResultsDir/Correct
+			# Record correct files in the "Correct" file.
+			#echo "$nextFile" >> $ResultsDir/FilesCorrect.txt
 		fi
+
+		# Add record of evaluating the file.
+		echo "$nextFile, $CORRECT" >> $ResultsDir/FileResults.csv
 	else
 		echo "    Already processed: $file"
 	fi
+
+	INDEX=$((INDEX+1))
 done

 # Compile all metrics/diffs,
 # and then compute metric summaries and confusion matrices.
-cat $ResultsDir/Metrics/*.m > $ResultsDir/Metrics.m
+cat $ResultsDir/Metrics/*.csv > $ResultsDir/$BNAME.csv
 ALLDIFFS=`ls $ResultsDir/Metrics | grep .diff`
 if [ -n "$ALLDIFFS" ]
 then
-	cat $ResultsDir/Metrics/*.diff > $ResultsDir/Diffs.diff
+	cat $ResultsDir/Metrics/*.diff > $ResultsDir/$BNAME.diff
 else
-	touch $ResultsDir/__NoErrors
-	touch $ResultsDir/Diffs.diff  # empty - no errors.
+	touch $ResultsDir/00_NoErrors
+	touch $ResultsDir/$BNAME.diff  # empty - no errors.
 fi

-python $LgEvalDir/src/sumMetric.py $ResultsDir/Metrics.m > $ResultsDir/__Summary
-python $LgEvalDir/src/sumDiff.py $ResultsDir/Diffs.diff html > $ResultsDir/ConfusionMatrix.html
+python $LgEvalDir/src/sumMetric.py $ResultsDir/$BNAME.csv > $ResultsDir/00_Summary.txt
+python $LgEvalDir/src/sumDiff.py $ResultsDir/$BNAME.diff html > $ResultsDir/ConfusionMatrices.html
+python $LgEvalDir/src/sumDiff.py $ResultsDir/$BNAME.diff > $ResultsDir/ConfusionMatrices.csv
+
+# RZ Oct. 2014: Create spreadsheet pairing file names with metrics.
+# Clean up raw metric data to make the file smaller and simpler.
+# Use awk and head to select every odd (headers) and even (data) columns,
+# Concatenate one header row with data contents.
+awk -F',' '{ for (i=1;i<=NF;i+=2) printf ("%s%c", $i, i + 2 <= NF ? "," : "\n")}' $ResultsDir/$BNAME.csv > $ResultsDir/Headers.csv
+
+# Obtain first row for data labels; insert a "File" label in the first column.
+head -n 1 $ResultsDir/Headers.csv > $ResultsDir/HeaderRow.csv
+HEAD=`cat $ResultsDir/HeaderRow.csv`
+echo "File,Result,$HEAD" > $ResultsDir/HeaderRow.csv
+
+awk -F',' '{ for (i=2;i<=NF;i+=2) printf ("%s%c", $i, i + 2 <= NF ? "," : "\n")}' $ResultsDir/$BNAME.csv > $ResultsDir/Data.csv
+
+# Combine file names with raw data metrics, then add header labels.
+paste -d , $ResultsDir/FileResults.csv $ResultsDir/Data.csv > $ResultsDir/DataNew.csv
+cat $ResultsDir/HeaderRow.csv $ResultsDir/DataNew.csv > $ResultsDir/00_RawResults.csv
+
+# Clean up.
+rm -f $ResultsDir/Headers.csv $ResultsDir/HeaderRow.csv $ResultsDir/Data.csv
+rm -f $ResultsDir/DataNew.csv
+
+# Remove the compiled metrics and differences, but leave the individual metric/diff
+# files in Metrics to support debugging for malformed or missing files, etc.
+rm -f $ResultsDir/$BNAME.csv $ResultsDir/$BNAME.diff

 echo "done."

--- a/bin/evaluateMat
+++ b/bin/evaluateMat
+#!/bin/bash
+
+# Make sure that CROHMELibDir and LgEvalDir are defined in
+# your shell enviroment, e.g. by including:
+#	
+#	export LgEvalDir=<path_to_LgEval>
+#	export CROHMELibDir=<path_to_CROHMELib>       		
+#	export PATH=$PATH:$CROHMELibDir/bin:$LgEvalDir/bin
+# 
+# in your .bashrc file (the initialization file for bash shell). The PATH
+# alteration will add the tools to your search path. 
+
+if [ $# -lt 2 ]
+then
+	echo "LgEval evaluateMat: Label graph matrix evaluation (CROHME 2014)"
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2013"
+	echo ""
+	echo "Usage: evaluateMat outputDir groundTruthDir [t/d/s/b]"
+	echo ""
+	echo "Evaluates all label graph (.lg) files in outputDir against"
+	echo "corresponding files in groundTruthDir. groundTruthDir is used"
+	echo "to generate the list of files to be compared (i.e. if a file is"
+	echo "not in the ground truth directory, it will not be considered)."
+	
+	echo "*The label graphs are filtered for different objects and"
+	echo " relationship types (e.g. for Rows and Columns), with"
+	echo " metrics compiled for the different levels/types of structure."
+	echo ""
+	echo "Outputs"
+	echo "-----------------------------"
+	echo " Results<outputDir>/"
+	echo "    Summary : summary of performance metrics"
+	echo "    Correct : list outputDir files matching ground truth"
+	echo "    Metrics.csv : metrics for all .lg files compared"
+	echo "    Diffs.diff : all differences between files"
+	echo "    ConfusionMatrix.html : node and edge label confusion matrix"
+	echo "        (errors only)"
+	echo "" 
+	echo "    Metrics/ : directory with .csv (metric) and .diff (difference) file for"
+	echo "      each comparison. .dot (GraphViz) and .pdf files are generated for"
+	echo "      viewing differences between files if a third argument is provided."
+	echo ""
+	echo "NOTE: the different visualizations of structural differences are described"
+	echo "      if you run lg2dot without arguments (object (t)ree; (d)irected graph"
+	echo "      over objects; primitive (s)egmentation graph; (b)ipartite graph over"
+	echo "      primitives."
+	exit 0
+fi
+
+dir=$1
+BNAME=`basename $1`
+truthDir=$2
+ResultsDir=Results_$BNAME
+
+
+mkdir -p $ResultsDir/Metrics
+mkdir -p $ResultsDir/MatMetrics
+
+
+echo "Output File Directory:  $1"
+echo "Ground Truth Directory: $2"
+echo ""
+
+# Compute all .csv metrics outputs (per-file), and .diff results (per-file).
+echo "Evaluating files..."
+PREFIX=res_
+for file in `ls $truthDir/*.lg`
+do
+	FNAME=`basename $file .lg`
+	nextFile=$dir/$FNAME.lg
+	if  [[ ! -e $ResultsDir/Metrics/$FNAME.csv ]]
+	then
+		# NOTE: the script convertCrohmeLg can be used to convert
+		#       crohme .inkml files to .lg files.
+		echo "  >> Comparing $FNAME.lg"
+
+		python $LgEvalDir/src/evallg.py $nextFile $file m > $ResultsDir/Metrics/$FNAME.csv
+		python $LgEvalDir/src/evallg.py $nextFile $file MATRIX $ResultsDir/MatMetrics/$FNAME
+		
+		DIFF=`python $LgEvalDir/src/evallg.py $nextFile $file diff`
+		if [ -n "$DIFF" ]
+		then
+			echo "$DIFF" > $ResultsDir/Metrics/$FNAME.diff 
+			
+			
+		else
+			rm -f $ResultsDir/Metrics/$FNAME.diff
+			echo "$nextFile" >> $ResultsDir/Correct
+		fi
+	else
+		echo "    Already processed: $file"
+	fi
+done
+
+# Compile all metrics/diffs,
+# and then compute metric summaries and confusion matrices.
+cat $ResultsDir/Metrics/*.csv > $ResultsDir/Metrics.csv
+ALLDIFFS=`ls $ResultsDir/Metrics | grep .diff`
+if [ -n "$ALLDIFFS" ]
+then
+	cat $ResultsDir/Metrics/*.diff > $ResultsDir/Diffs.diff
+else
+	touch $ResultsDir/__NoErrors
+	touch $ResultsDir/Diffs.diff  # empty - no errors.
+fi
+
+python $LgEvalDir/src/sumMetric.py $ResultsDir/Metrics.csv > $ResultsDir/__Summary
+python $LgEvalDir/src/sumDiff.py $ResultsDir/Diffs.diff html > $ResultsDir/ConfusionMatrix.html
+
+for typErr in Mat Cell Row Col Symb
+do
+	cat $ResultsDir/MatMetrics/*${typErr}.csv > $ResultsDir/Metrics${typErr}.csv
+	python $LgEvalDir/src/sumMetric.py $ResultsDir/Metrics${typErr}.csv > $ResultsDir/_${typErr}_Summary
+done
+
+echo "done."
+
--- a/bin/ldiff
+++ b/bin/ldiff
+#!/bin/bash
+
+if [ $# -lt 3 ]
+then
+	echo "LgEval ldiff: List Files with Common Errors"
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2014"
+	echo ""
+	echo "Usage: ldiff [-NESC^] outputPattern targetPattern <files>"
+	echo ""
+	echo "Create a list of files that contain errors on label graph"
+	echo "nodes or edges matching the provided patterns (egrep-format"
+	echo "regular expressions). Matching files are written on standard output."
+	echo ""
+	echo "*Note: the pattern 'any' will match any label."
+	echo ""
+	echo "A single flag list token indicates whether to limit matches to"
+	echo "(N)ode label errors, (E)dge label errors, and/or files with"
+	echo "(S)egmentation errors or only (C)orrect segmentations. Including"
+	echo "^ in the flag list token will return files that do not match the"
+	echo "passed patterns."
+	exit 0
+fi
+
+# By default, return files matching the given patterns.
+FLIST="-l"
+
+# Create initial list of all .diff files.
+CFILES=""
+FLAGS=""
+
+if [[ $1 == -* ]]
+then
+	# Grab flag string and shift.
+	FLAGS=$1
+	shift
+fi
+
+# Grad the patterns; shift to file list.
+OUTP=$1
+TARP=$2
+shift
+shift
+
+# Take CFILES (current files) as all passed .diff files.
+CFILES="$@"
+
+
+# Indicate if we want the complement of a match.
+if [[ $FLAGS == *^* ]]
+then
+	FLIST="-L"
+fi
+
+# Note that segment error/correct seg are exclusive.
+if [[ $FLAGS == *S* ]]
+then
+	CFILES=`grep -l "^*S" $@`
+elif [[ $FLAGS == *C* ]]
+then	
+	# -L flag selects inverse of the pattern ('negates' it)
+	CFILES=`grep -L "^*S" $@`
+fi
+
+# one or more non-comma characters: reg expression for 'any label' (*)
+# Create the pattern string to use with grep, then run the filter and
+# obtain the list of matching files.
+ANYLABEL="[^,][^,]*"
+MID=",1.0,:vs:,"
+
+OUTLABEL=$ANYLABEL
+TARLABEL=$ANYLABEL
+if [ "$OUTP" != "any" ]
+then
+	OUTLABEL="$OUTP"
+fi
+
+if [ "$TARP" != "any" ]
+then
+	TARLABEL="$TARP"
+fi
+
+PATTERN="$OUTLABEL$MID$TARLABEL"
+
+# Note that Node/Edge filtering is also exclusive.
+if [[ $FLAGS == *N* ]]
+then
+	PATTERN="^\*N.*$PATTERN"
+elif [[ $FLAGS == *E* ]]
+then	
+	PATTERN="^\*E.*$PATTERN"
+fi
+
+# Use extended regular expressions to ease usage.
+# FLIST allows the complement to be returned if desired.
+CFILES=`grep $FLIST -E "$PATTERN" $CFILES`
+
+# Write the matching file names on standard output.
+for file in $CFILES
+do
+	echo `basename $file`
+done
+	
--- a/bin/lg2NE
+++ b/bin/lg2NE
+#!/bin/bash
+
+if [ $# -lt 1 ]
+then
+	echo "LgEval lg2NE: Node-Edge Label Graph Format Converter"
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2014"
+	echo ""
+	echo "Usage: lg2NE <file.lg>"
+	echo ""
+	echo "Writes the contents of file.lg as an NE label graph file"
+	echo "on standard output."
+
+	exit 0
+fi
+
+python $LgEvalDir/src/lg2NE.py $1
+
+	
--- a/bin/lg2OR
+++ b/bin/lg2OR
+#!/bin/bash
+
+if [ $# -lt 1 ]
+then
+	echo "LgEval lg2OR: Object-Relationship Label Graph Format Converter"
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2014"
+	echo ""
+	echo "Usage: lg2OR <file.lg>"
+	echo ""
+	echo "Writes the contents of file.lg as an OR label graph file"
+	echo "on standard output."
+
+	exit 0
+fi
+
+python $LgEvalDir/src/lg2OR.py $1
+
+	
--- a/bin/lg2dot
+++ b/bin/lg2dot
@@ -12,10 +12,10 @@

 if [ $# -lt 1 ]
 then
-	echo "LgEval Label graph to dot (GraphViz) converter"
-	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2013"
+	echo "LgEval lg2dot: Label graph to dot (GraphViz) converter"
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2014"
 	echo ""
-	echo "Usage: lg2mml file1.lg [file2.lg] [graph_type]"
+	echo "Usage: lg2dot file1.lg [file2.lg] [graph_type]"
 	echo ""
 	echo "Converts a label graph file files to a .dot file,"
 	echo "which can then be converted to a .pdf, .png or other"
@@ -26,8 +26,9 @@ then
 	echo "file) is visualized."
 	echo ""
 	echo "The graph_type argument may be one of the following:"
-	echo "   - [default; no argument] a bipartite graph over strokes."
-	echo "   - s : a bipartite segmentation graph (shows strokes in symbols)"
+	echo "   - [default; no argument] a directed graph over strokes."
+	echo "   - b : a bipartite graph over primitives"
+	echo "   - s : a bipartite segmentation graph"
 	echo "   - d : a directed acyclic graph over strokes"
 	echo "   - t : a tree (NOTE: requires a valid hierachical structure)"
 	exit 0
@@ -35,17 +36,7 @@ fi

 BNAME=`basename $1 .lg`

-if [ $# -eq 1 ]
-then
-	python $LgEvalDir/src/lg2dot.py $1 > $BNAME.dot
-elif [ $# -eq 2 ]
-then
-	# Use tree output by default.
-	python $LgEvalDir/src/lg2dot.py $1 $2  > $BNAME.dot
-else
-	python $LgEvalDir/src/lg2dot.py $1 $2 $3 > $BNAME.dot
-fi
-
-# Call dot and generate a .pdf file.
+# Generate dot file, then call dot and generate a .pdf file.
+python $LgEvalDir/src/lg2dot.py $@ > $BNAME.dot
 dot -Tpdf $BNAME.dot -o $BNAME.pdf 

--- a/bin/lg2lgtree
+++ b/bin/lg2lgtree
@@ -12,8 +12,8 @@

 if [ $# -lt 2 ]
 then
-	echo "LgEval Label graph to Label graph tree converter "
-	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2013"
+	echo "LgEval lg2lgtree: Label graph to Label graph tree converter "
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2014"
 	echo ""
 	echo "Usage: lg2lgtree <indir> <outdir>"
 	echo ""

--- a/bin/lg2mml
+++ b/bin/lg2mml
@@ -12,12 +12,12 @@

 if [ $# -lt 1 ]
 then
-	echo "LgEval Label graph to text converter"
-	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2013"
+	echo "LgEval lg2mml: Label graph to Presentation MathML converter"
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2014"
 	echo ""
 	echo "Usage: lg2mml file.lg"
 	echo ""
-	echo "Converts a label graph file files to a MathML file,"
+	echo "Converts a label graph file to a MathML file,"
 	echo "written as file.mml to the current directory."
 	exit 0
 fi

--- a/bin/lgfilter
+++ b/bin/lgfilter
@@ -12,8 +12,8 @@

 if [ $# -lt 1 ]
 then
-	echo "LgEval Label graph edge filter"
-	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2013"
+	echo "LgEval lgfilter: Label graph edge filter"
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2014"
 	echo ""
 	echo "Usage: lgfilter <infile> [outfile]"
 	echo ""

--- a/bin/old/crohme2012_eval
+++ b/bin/old/crohme2012_eval
-#!/bin/bash
-
-# Make sure that CROHMELibDir and LgEvalDir are defined in
-# your shell enviroment, e.g. by including:
-#	
-#	export LgEvalDir=<path_to_LgEval>
-#	export CROHMELibDir=<path_to_CROHMELib>       		
-#	export PATH=$PATH:$CROHMELibDir/bin:$LgEvalDir/bin
-# 
-# in your .bashrc file (the initialization file for bash shell). The PATH
-# alteration will add the tools to your search path. 
-
-if [ $# -lt 1 ]
-then
-	echo "Usage: eval <outputDir> [groundTruthDir]"
-	echo ""
-	echo "Compares all .lg files in groundTruthDir to to matching"
-	echo "files (currently, with res_ prefix, and .inkml.lg sufix)."
-	echo ""
-	echo "Note: this script compares label graphs (.lg files)."
-	echo ""
-	echo "Output files:"
-	echo "  - .m (metric) and .diff (difference) files for"
-	echo "    each test file in outputDir"
-	echo "  - ./Correct[outputDir] containing the list of correct"
-	echo "     files from outputDir (i.e. matching groundTruth)"
-	echo "  - ./Metrics[outputDir] compiling all metric values"
-	echo "  - ./Diffs[outputDir] compiling all differences"
-	echo ""
-	echo "  - ./Results[outputDir] summarizing performance metrics"
-	echo "  - ./Diffs[outputDir].csv, providing node and edge label"
-	echo "    confusion matrices (NOTE: currently errors only - no correct counts)"
-
-	exit 0
-fi
-
-dir=$1
-echo "Evaluating recognizer output files in $dir..."
-# Remove summary files.
-rm -f Correct$dir Results$dir Diffs$dir Metrics$dir
-
-
-# Compute all .m metrics outputs (per-file), and .diff results (per-file).
-cd $dir
-# Clean up old evaluation files
-rm -f *.m *.diff  
-
-truthDir=testDataGT
-if [ $# -gt 1 ]
-then
-	truthDir=$2
-fi
-
-PREFIX=res_
-#PREFIX=""   # For use with ground truth data on itself.
-for file in ../$truthDir/*.lg
-do
-	nextFile=`basename $file`
-
-	python evallg.py $PREFIX$nextFile $file m > $nextFile.m
-	DIFF=`python evallg.py $PREFIX$nextFile $file diff`
-	if [ -n "$DIFF" ]
-	then
-		echo "$DIFF" > $nextFile.diff
-	else
-		echo "$PREFIX$nextFile" >> ../Correct$dir
-	fi
-done
-
-# Back to main directory - compile all metrics/diffs,
-# and then compute metric summaries and confusion matrices.
-cd ..
-cat $dir/*.m > ./Metrics$dir
-cat $dir/*.diff > ./Diffs$dir
-
-python sumMetric.py Metrics$dir > Results$dir
-python sumDiff.py Diffs$dir > Diffs$dir.csv
-
-echo "finished."
-
--- a/bin/relabelEdges
+++ b/bin/relabelEdges
+#!/bin/bash
+
+if [ $# -lt 1 ]
+then
+	echo "LgEval relabelEdges: Edge relabeling tool"
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2014"
+	echo ""
+	echo "Usage: relabelEdges <file.lg> [ Label1 ] [ Replacement1 ] ..."
+	echo ""
+	echo "Replace edge labels in a 'raw' label graph file (using node (N) and"
+	echo "edge (E) entries for primitives). A list of edge labels and their"
+	echo "replacements are provided as arguments."
+	echo ""
+	echo "This was created to handle cases where * was used to represent"
+	echo "merge relationships, and edge relationships match symbol labels."
+	echo "These labels will conflict with segmentation edges in the new"
+	echo "label graph representation (e.g. if R is used both to label"
+	echo "symbols and to represent a Right-of relationship)."
+
+	exit 0
+fi
+
+grep "^N,\|#" $1 > HEAD_TEMP
+grep "^E," $1 > EDGE_TEMP
+
+# Shift the arguments to allow for simple iteration through pairs.
+shift
+
+while test $# -gt 0
+do
+	perl -p -i -e "s/$1/$2/g" EDGE_TEMP
+	shift
+	shift
+done
+
+cat HEAD_TEMP
+cat EDGE_TEMP
+
+rm -f HEAD_TEMP EDGE_TEMP
+
+exit 0
+
--- a/bin/relabelOldCROHME
+++ b/bin/relabelOldCROHME
+#!/bin/bash
+
+if [ $# -lt 1 ]
+then
+	echo "LgEval relabelOldCROHME: Edge Relabeler for Old CROHME Files"
+	echo "Copyright (c) R. Zanibbi, H. Mouchere, 2012-2014"
+	echo ""
+	echo "Usage: relabelOldCROHME <dir>"
+	echo ""
+	echo "Used to relabel old 'raw' label graph files with N and E"
+	echo "entries, converting short names for relationships to the"
+	echo "longer ones used for CROHME 2014."
+	exit 0
+fi
+
+for file in $1/*.lg
+do
+	# Replace (R)ight, (A)bove, (B)elow and (I)nside relationships.
+	relabelEdges $file R Right A Above B Below I Inside > tempFile
+	mv tempFile $file
+done
+