public abstract class AbstractSequenceClassifier<IN extends CoreMap>
extends java.lang.Object
implements java.util.function.Function<java.lang.String,java.lang.String>
An implementation must implement these 5 abstract methods:
List<IN> classify(List<IN> document);
List<IN> classifyWithGlobalInformation(List<IN> tokenSequence, final CoreMap document, final CoreMap sentence);
void train(Collection<List<IN>> docs, DocumentReaderAndWriter<IN> readerAndWriter);
void serializeClassifier(String serializePath);
void loadClassifier(ObjectInputStream in, Properties props) throws IOException,
ClassCastException, ClassNotFoundException;
but a runtime (or rule-based) implementation can usefully implement just the first,
and throw UnsupportedOperationException for the rest. Additionally, this method throws
UnsupportedOperationException by default, but is implemented for some classifiers:
Pair<Counter<Integer>, TwoDimensionalCounter<Integer,String>> printProbsDocument(List<CoreLabel> document);
Modifier and Type | Field and Description |
---|---|
Index<java.lang.String> |
classIndex |
java.util.List<FeatureFactory<IN>> |
featureFactories |
SeqClassifierFlags |
flags |
protected MaxSizeConcurrentHashSet<java.lang.String> |
knownLCWords
Different threads can add or query knownLCWords at the same time,
so we need a concurrent data structure.
|
protected IN |
pad |
int |
windowSize |
Constructor and Description |
---|
AbstractSequenceClassifier(java.util.Properties props)
Construct a SeqClassifierFlags object based on the passed in properties,
and then call the other constructor.
|
AbstractSequenceClassifier(SeqClassifierFlags flags)
Initialize the featureFactory and other variables based on the passed in
flags.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
apply(java.lang.String in)
Maps a String input to an XML-formatted rendition of applying NER to the
String.
|
java.lang.String |
backgroundSymbol()
Returns the background class for the classifier.
|
abstract java.util.List<IN> |
classify(java.util.List<IN> document)
Classify a
List of something that extendsCoreMap . |
java.util.List<java.util.List<IN>> |
classify(java.lang.String str)
Classify the tokens in a String.
|
void |
classifyAndWriteAnswers(java.util.Collection<java.util.List<IN>> documents,
DocumentReaderAndWriter<IN> readerWriter,
boolean outputScores) |
void |
classifyAndWriteAnswers(java.util.Collection<java.util.List<IN>> documents,
java.io.PrintWriter printWriter,
DocumentReaderAndWriter<IN> readerWriter,
boolean outputScores) |
void |
classifyAndWriteAnswers(java.lang.String testFile)
Load a test file, run the classifier on it, and then print the answers to
stdout (with timing to stderr).
|
void |
classifyAndWriteAnswers(java.lang.String testFile,
DocumentReaderAndWriter<IN> readerWriter,
boolean outputScores)
Load a test file, run the classifier on it, and then print the answers to
stdout (with timing to stderr).
|
void |
classifyAndWriteAnswers(java.lang.String testFile,
java.io.OutputStream outStream,
DocumentReaderAndWriter<IN> readerWriter,
boolean outputScores)
If the flag
outputEncoding is defined, the output is written in that
character encoding, otherwise in the system default character encoding. |
void |
classifyAndWriteAnswers(java.lang.String baseDir,
java.lang.String filePattern,
DocumentReaderAndWriter<IN> readerWriter,
boolean outputScores) |
void |
classifyAndWriteAnswersKBest(ObjectBank<java.util.List<IN>> documents,
int k,
java.io.PrintWriter printWriter,
DocumentReaderAndWriter<IN> readerAndWriter)
Run the classifier on the documents in an ObjectBank, and print the
answers to a given PrintWriter (with timing to stderr).
|
void |
classifyAndWriteAnswersKBest(java.lang.String testFile,
int k,
DocumentReaderAndWriter<IN> readerAndWriter)
Load a test file, run the classifier on it, and then print the answers to
stdout (with timing to stderr).
|
void |
classifyAndWriteViterbiSearchGraph(java.lang.String testFile,
java.lang.String searchGraphPrefix,
DocumentReaderAndWriter<IN> readerAndWriter)
Load a test file, run the classifier on it, and then write a Viterbi search
graph for each sequence.
|
java.util.List<java.util.List<IN>> |
classifyFile(java.lang.String filename)
Classify the contents of a file.
|
void |
classifyFilesAndWriteAnswers(java.util.Collection<java.io.File> testFiles) |
void |
classifyFilesAndWriteAnswers(java.util.Collection<java.io.File> testFiles,
DocumentReaderAndWriter<IN> readerWriter,
boolean outputScores) |
Counter<java.util.List<IN>> |
classifyKBest(java.util.List<IN> doc,
java.lang.Class<? extends CoreAnnotation<java.lang.String>> answerField,
int k)
Takes a list of tokens and provides the K best sequence labelings of these tokens with their scores.
|
java.util.List<java.util.List<IN>> |
classifyRaw(java.lang.String str,
DocumentReaderAndWriter<IN> readerAndWriter)
Classify the tokens in a String.
|
java.util.List<IN> |
classifySentence(java.util.List<? extends HasWord> sentence)
Classify a List of IN.
|
java.util.List<IN> |
classifySentenceWithGlobalInformation(java.util.List<? extends HasWord> tokenSequence,
CoreMap doc,
CoreMap sentence)
Classify a List of IN using whatever additional information is passed in globalInfo.
|
void |
classifyStdin() |
void |
classifyStdin(DocumentReaderAndWriter<IN> readerWriter) |
java.util.List<Triple<java.lang.String,java.lang.Integer,java.lang.Integer>> |
classifyToCharacterOffsets(java.lang.String sentences)
Classify the contents of a
String to classified character offset
spans. |
java.lang.String |
classifyToString(java.lang.String sentences)
Classify the contents of a String to a tagged word/class String.
|
java.lang.String |
classifyToString(java.lang.String sentences,
java.lang.String outputFormat,
boolean preserveSpacing)
Classify the contents of a
String to one of several String
representations that shows the classes. |
abstract java.util.List<IN> |
classifyWithGlobalInformation(java.util.List<IN> tokenSequence,
CoreMap document,
CoreMap sentence)
Classify a
List of something that extends CoreMap using as
additional information whatever is stored in the document and sentence. |
java.lang.String |
classifyWithInlineXML(java.lang.String sentences)
Classify the contents of a
String . |
boolean |
countResults(java.util.List<IN> doc,
Counter<java.lang.String> entityTP,
Counter<java.lang.String> entityFP,
Counter<java.lang.String> entityFN)
Count results using a method appropriate for the tag scheme being used.
|
static boolean |
countResultsSegmenter(java.util.List<? extends CoreMap> doc,
Counter<java.lang.String> entityTP,
Counter<java.lang.String> entityFP,
Counter<java.lang.String> entityFN) |
DocumentReaderAndWriter<IN> |
defaultReaderAndWriter() |
void |
dumpFeatures(java.util.Collection<java.util.List<IN>> documents)
Does nothing by default.
|
void |
finalizeClassification(CoreMap document)
Classification is finished for the document.
|
java.util.Set<java.lang.String> |
getKnownLCWords() |
Sampler<java.util.List<IN>> |
getSampler(java.util.List<IN> input) |
SequenceModel |
getSequenceModel(java.util.List<IN> doc) |
DFSA<java.lang.String,java.lang.Integer> |
getViterbiSearchGraph(java.util.List<IN> doc,
java.lang.Class<? extends CoreAnnotation<java.lang.String>> answerField) |
java.util.Set<java.lang.String> |
labels() |
void |
loadClassifier(java.io.File file) |
void |
loadClassifier(java.io.File file,
java.util.Properties props)
Loads a classifier from the file specified.
|
void |
loadClassifier(java.io.InputStream in)
Load a classifier from the specified InputStream.
|
void |
loadClassifier(java.io.InputStream in,
java.util.Properties props)
Load a classifier from the specified InputStream.
|
abstract void |
loadClassifier(java.io.ObjectInputStream in,
java.util.Properties props)
Load a classifier from the specified input stream.
|
void |
loadClassifier(java.lang.String loadPath)
Loads a classifier from the file specified by loadPath.
|
void |
loadClassifier(java.lang.String loadPath,
java.util.Properties props)
Loads a classifier from the file specified by loadPath.
|
void |
loadClassifierNoExceptions(java.io.File file) |
void |
loadClassifierNoExceptions(java.io.File file,
java.util.Properties props) |
void |
loadClassifierNoExceptions(java.io.InputStream in,
java.util.Properties props)
Loads a classifier from the given input stream.
|
void |
loadClassifierNoExceptions(java.lang.String loadPath) |
void |
loadClassifierNoExceptions(java.lang.String loadPath,
java.util.Properties props) |
void |
loadJarClassifier(java.lang.String modelName,
java.util.Properties props)
This function will load a classifier that is stored inside a jar file (if
it is so stored).
|
ObjectBank<java.util.List<IN>> |
makeObjectBankFromFile(java.lang.String filename) |
ObjectBank<java.util.List<IN>> |
makeObjectBankFromFile(java.lang.String filename,
DocumentReaderAndWriter<IN> readerAndWriter) |
ObjectBank<java.util.List<IN>> |
makeObjectBankFromFiles(java.util.Collection<java.io.File> files,
DocumentReaderAndWriter<IN> readerAndWriter) |
ObjectBank<java.util.List<IN>> |
makeObjectBankFromFiles(java.lang.String[] trainFileList,
DocumentReaderAndWriter<IN> readerAndWriter) |
ObjectBank<java.util.List<IN>> |
makeObjectBankFromFiles(java.lang.String baseDir,
java.lang.String filePattern,
DocumentReaderAndWriter<IN> readerAndWriter) |
ObjectBank<java.util.List<IN>> |
makeObjectBankFromReader(java.io.BufferedReader in,
DocumentReaderAndWriter<IN> readerAndWriter)
Set up an ObjectBank that will allow one to iterate over a collection of
documents obtained from the passed in Reader.
|
ObjectBank<java.util.List<IN>> |
makeObjectBankFromString(java.lang.String string,
DocumentReaderAndWriter<IN> readerAndWriter)
Reads a String into an ObjectBank object.
|
DocumentReaderAndWriter<IN> |
makePlainTextReaderAndWriter()
Makes a DocumentReaderAndWriter based on
flags.plainTextReaderAndWriter.
|
DocumentReaderAndWriter<IN> |
makeReaderAndWriter()
Makes a DocumentReaderAndWriter based on the flags the CRFClassifier
was constructed with.
|
static void |
outputCalibrationInfo(java.io.PrintWriter pw,
Counter<java.lang.Integer> calibration,
Counter<java.lang.Integer> correctByBin,
TwoDimensionalCounter<java.lang.Integer,java.lang.String> calibratedTokens) |
DocumentReaderAndWriter<IN> |
plainTextReaderAndWriter() |
protected void |
printFeatureLists(IN wi,
java.util.Collection<java.util.List<java.lang.String>> features)
Print the String features generated from a token.
|
protected void |
printFeatures(IN wi,
java.util.Collection<java.lang.String> features)
Print the String features generated from a IN
|
void |
printProbs(java.util.Collection<java.io.File> testFiles,
DocumentReaderAndWriter<IN> readerWriter)
Takes the files, reads them in, and prints out the likelihood of each possible
label at each point.
|
void |
printProbs(java.lang.String filename,
DocumentReaderAndWriter<IN> readerAndWriter)
Takes the file, reads it in, and prints out the likelihood of each possible
label at each point.
|
Triple<Counter<java.lang.Integer>,Counter<java.lang.Integer>,TwoDimensionalCounter<java.lang.Integer,java.lang.String>> |
printProbsDocument(java.util.List<IN> document) |
void |
printProbsDocuments(ObjectBank<java.util.List<IN>> documents)
Takes a
List of documents and prints the likelihood of each
possible label at each point. |
static void |
printResults(Counter<java.lang.String> entityTP,
Counter<java.lang.String> entityFP,
Counter<java.lang.String> entityFN)
Given counters of true positives, false positives, and false
negatives, prints out precision, recall, and f1 for each key.
|
protected void |
reinit()
This method should be called after there have been changes to the flags
(SeqClassifierFlags) variable, such as after deserializing a classifier.
|
java.util.List<java.lang.String> |
segmentString(java.lang.String sentence)
ONLY USE IF LOADED A CHINESE WORD SEGMENTER!!!!!
|
java.util.List<java.lang.String> |
segmentString(java.lang.String sentence,
DocumentReaderAndWriter<IN> readerAndWriter) |
abstract void |
serializeClassifier(java.io.ObjectOutputStream oos)
Serialize a sequence classifier to an object output stream
|
abstract void |
serializeClassifier(java.lang.String serializePath)
Serialize a sequence classifier to a file on the given path.
|
void |
train()
Train the classifier based on values in flags.
|
void |
train(java.util.Collection<java.util.List<IN>> docs)
Trains a classifier from a Collection of sequences.
|
abstract void |
train(java.util.Collection<java.util.List<IN>> docs,
DocumentReaderAndWriter<IN> readerAndWriter)
Trains a classifier from a Collection of sequences.
|
void |
train(java.lang.String filename) |
void |
train(java.lang.String[] trainFileList,
DocumentReaderAndWriter<IN> readerAndWriter) |
void |
train(java.lang.String filename,
DocumentReaderAndWriter<IN> readerAndWriter) |
void |
train(java.lang.String baseTrainDir,
java.lang.String trainFiles,
DocumentReaderAndWriter<IN> readerAndWriter) |
int |
windowSize() |
void |
writeAnswers(java.util.List<IN> doc,
java.io.PrintWriter printWriter,
DocumentReaderAndWriter<IN> readerAndWriter)
Write the classifications of the Sequence classifier to a writer in a
format determined by the DocumentReaderAndWriter used.
|
public SeqClassifierFlags flags
public Index<java.lang.String> classIndex
public java.util.List<FeatureFactory<IN extends CoreMap>> featureFactories
public int windowSize
protected MaxSizeConcurrentHashSet<java.lang.String> knownLCWords
public AbstractSequenceClassifier(java.util.Properties props)
props
- See SeqClassifierFlags for known properties.public AbstractSequenceClassifier(SeqClassifierFlags flags)
flags
- A specification of the AbstractSequenceClassifier to construct.public DocumentReaderAndWriter<IN> defaultReaderAndWriter()
public DocumentReaderAndWriter<IN> plainTextReaderAndWriter()
protected final void reinit()
Implementation note: At the moment this variable doesn't set windowSize or featureFactory, since they are being serialized separately in the file, but we should probably stop serializing them and just reinitialize them from the flags?
public java.util.Set<java.lang.String> getKnownLCWords()
public DocumentReaderAndWriter<IN> makeReaderAndWriter()
public DocumentReaderAndWriter<IN> makePlainTextReaderAndWriter()
public java.lang.String backgroundSymbol()
public java.util.Set<java.lang.String> labels()
public java.util.List<IN> classifySentence(java.util.List<? extends HasWord> sentence)
sentence
- The List of IN to be classified.CoreAnnotations.AnswerAnnotation
field.public java.util.List<IN> classifySentenceWithGlobalInformation(java.util.List<? extends HasWord> tokenSequence, CoreMap doc, CoreMap sentence)
tokenSequence
- The List of IN to be classified.public SequenceModel getSequenceModel(java.util.List<IN> doc)
public Counter<java.util.List<IN>> classifyKBest(java.util.List<IN> doc, java.lang.Class<? extends CoreAnnotation<java.lang.String>> answerField, int k)
doc
- The List of tokensanswerField
- The key for each token into which the label for the token will be writtenk
- The number of best sequence labelings to generatepublic DFSA<java.lang.String,java.lang.Integer> getViterbiSearchGraph(java.util.List<IN> doc, java.lang.Class<? extends CoreAnnotation<java.lang.String>> answerField)
public java.util.List<java.util.List<IN>> classify(java.lang.String str)
str
- A String with tokens in one or more sentences of text to be
classified.List
of classified sentences (each a List of something that
extends CoreMap
).public java.util.List<java.util.List<IN>> classifyRaw(java.lang.String str, DocumentReaderAndWriter<IN> readerAndWriter)
str
- A String with tokens in one or more sentences of text to be
classified.List
of classified sentences (each a List of something that
extends CoreMap
).public java.util.List<java.util.List<IN>> classifyFile(java.lang.String filename)
filename
- Contains the sentence(s) to be classified.List
of classified List of IN.public java.lang.String apply(java.lang.String in)
apply
in interface java.util.function.Function<java.lang.String,java.lang.String>
public java.lang.String classifyToString(java.lang.String sentences, java.lang.String outputFormat, boolean preserveSpacing)
String
to one of several String
representations that shows the classes. Plain text or XML input is expected
and the PlainTextDocumentReaderAndWriter
is used. The classifier
will tokenize the text and treat each sentence as a separate document. The
output can be specified to be in a choice of three formats: slashTags
(e.g., Bill/PERSON Smith/PERSON died/O ./O), inlineXML (e.g.,
<PERSON>Bill Smith</PERSON> went to
<LOCATION>Paris</LOCATION> .), or xml, for stand-off XML (e.g.,
<wi num="0" entity="PERSON">Sue</wi> <wi num="1"
entity="O">shouted</wi> ). There is also a binary choice as to
whether the spacing between tokens of the original is preserved or whether
the (tagged) tokens are printed with a single space (for inlineXML or
slashTags) or a single newline (for xml) between each one.
Fine points: The slashTags and xml formats show tokens as transformed by any normalization processes inside the tokenizer, while inlineXML shows the tokens exactly as they appeared in the source text. When a period counts as both part of an abbreviation and as an end of sentence marker, it is included twice in the output String for slashTags or xml, but only once for inlineXML, where it is not counted as part of the abbreviation (or any named entity it is part of). For slashTags with preserveSpacing=true, there will be two successive periods such as "Jr.." The tokenized (preserveSpacing=false) output will have a space or a newline after the last token.
sentences
- The String to be classified. It will be tokenized and
divided into documents according to (heuristically
determined) sentence boundaries.outputFormat
- The format to put the output in: one of "slashTags", "xml",
"inlineXML", "tsv", or "tabbedEntities"preserveSpacing
- Whether to preserve the input spacing between tokens, which may
sometimes be none (true) or whether to tokenize the text and print
it with one space between each token (false)String
with annotated with classification information.public java.lang.String classifyWithInlineXML(java.lang.String sentences)
String
. Plain text or XML is expected
and the PlainTextDocumentReaderAndWriter
is used by default.
The classifier
will treat each sentence as a separate document. The output can be
specified to be in a choice of formats: Output is in inline XML format
(e.g. <PERSON>Bill Smith</PERSON> went to
<LOCATION>Paris</LOCATION> .)sentences
- The string to be classifiedString
with annotated with classification information.public java.lang.String classifyToString(java.lang.String sentences)
PlainTextDocumentReaderAndWriter
is used by default.
Output looks like: My/O name/O is/O Bill/PERSON Smith/PERSON ./Osentences
- The String to be classifiedpublic java.util.List<Triple<java.lang.String,java.lang.Integer,java.lang.Integer>> classifyToCharacterOffsets(java.lang.String sentences)
String
to classified character offset
spans. Plain text or XML input text is expected and the
PlainTextDocumentReaderAndWriter
is used by default.
Output is a (possibly
empty, but not null
) List of Triples. Each Triple is an entity
name, followed by beginning and ending character offsets in the original
String. Character offsets can be thought of as fenceposts between the
characters, or, like certain methods in the Java String class, as character
positions, numbered starting from 0, with the end index pointing to the
position AFTER the entity ends. That is, end - start is the length of the
entity in characters.
Fine points: Token offsets are true wrt the source text, even though the tokenizer may internally normalize certain tokens to String representations of different lengths (e.g., " becoming `` or ''). When a period counts as both part of an abbreviation and as an end of sentence marker, and that abbreviation is part of a named entity, the reported entity string excludes the period.
sentences
- The string to be classifiedList
of Triple
s, each of which gives an entity
type and the beginning and ending character offsets.public java.util.List<java.lang.String> segmentString(java.lang.String sentence)
sentence
- The string to be classifiedpublic java.util.List<java.lang.String> segmentString(java.lang.String sentence, DocumentReaderAndWriter<IN> readerAndWriter)
public abstract java.util.List<IN> classify(java.util.List<IN> document)
List
of something that extendsCoreMap
.
The classifications are added in place to the items of the document,
which is also returned by this methoddocument
- A List
of something that extends CoreMap
.List
, but with the elements annotated with their
answers (stored under the
CoreAnnotations.AnswerAnnotation
key).public abstract java.util.List<IN> classifyWithGlobalInformation(java.util.List<IN> tokenSequence, CoreMap document, CoreMap sentence)
List
of something that extends CoreMap
using as
additional information whatever is stored in the document and sentence.
This is needed for SUTime (NumberSequenceClassifier), which requires
the document date to resolve relative dates.tokenSequence
- document
- sentence
- public void finalizeClassification(CoreMap document)
document
- public void train()
public void train(java.lang.String filename)
public void train(java.lang.String filename, DocumentReaderAndWriter<IN> readerAndWriter)
public void train(java.lang.String baseTrainDir, java.lang.String trainFiles, DocumentReaderAndWriter<IN> readerAndWriter)
public void train(java.lang.String[] trainFileList, DocumentReaderAndWriter<IN> readerAndWriter)
public void train(java.util.Collection<java.util.List<IN>> docs)
docs
- An ObjectBank or a collection of sequences of INpublic abstract void train(java.util.Collection<java.util.List<IN>> docs, DocumentReaderAndWriter<IN> readerAndWriter)
docs
- An ObjectBank or a collection of sequences of INreaderAndWriter
- A DocumentReaderAndWriter to use when loading test filespublic ObjectBank<java.util.List<IN>> makeObjectBankFromString(java.lang.String string, DocumentReaderAndWriter<IN> readerAndWriter)
string
- The String which will be the content of the ObjectBankpublic ObjectBank<java.util.List<IN>> makeObjectBankFromFile(java.lang.String filename)
public ObjectBank<java.util.List<IN>> makeObjectBankFromFile(java.lang.String filename, DocumentReaderAndWriter<IN> readerAndWriter)
public ObjectBank<java.util.List<IN>> makeObjectBankFromFiles(java.lang.String[] trainFileList, DocumentReaderAndWriter<IN> readerAndWriter)
public ObjectBank<java.util.List<IN>> makeObjectBankFromFiles(java.lang.String baseDir, java.lang.String filePattern, DocumentReaderAndWriter<IN> readerAndWriter)
public ObjectBank<java.util.List<IN>> makeObjectBankFromFiles(java.util.Collection<java.io.File> files, DocumentReaderAndWriter<IN> readerAndWriter)
public ObjectBank<java.util.List<IN>> makeObjectBankFromReader(java.io.BufferedReader in, DocumentReaderAndWriter<IN> readerAndWriter)
flags.documentReader
, and for some
reader choices, the column mapping given in flags.map
.in
- Input data addNEWLCWords do we add new lowercase words from this
data to the word shape classifierpublic void printProbs(java.lang.String filename, DocumentReaderAndWriter<IN> readerAndWriter)
filename
- The path to the specified filepublic void printProbs(java.util.Collection<java.io.File> testFiles, DocumentReaderAndWriter<IN> readerWriter)
testFiles
- A Collection of filespublic void printProbsDocuments(ObjectBank<java.util.List<IN>> documents)
List
of documents and prints the likelihood of each
possible label at each point.documents
- A List
of List
of something that extends
CoreMap
.public static void outputCalibrationInfo(java.io.PrintWriter pw, Counter<java.lang.Integer> calibration, Counter<java.lang.Integer> correctByBin, TwoDimensionalCounter<java.lang.Integer,java.lang.String> calibratedTokens)
public void classifyStdin() throws java.io.IOException
java.io.IOException
public void classifyStdin(DocumentReaderAndWriter<IN> readerWriter) throws java.io.IOException
java.io.IOException
public Triple<Counter<java.lang.Integer>,Counter<java.lang.Integer>,TwoDimensionalCounter<java.lang.Integer,java.lang.String>> printProbsDocument(java.util.List<IN> document)
public void classifyAndWriteAnswers(java.lang.String testFile) throws java.io.IOException
testFile
- The file to test on.java.io.IOException
public void classifyAndWriteAnswers(java.lang.String testFile, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) throws java.io.IOException
testFile
- The file to test on.readerWriter
- A reader and writer to use for the outputjava.io.IOException
public void classifyAndWriteAnswers(java.lang.String testFile, java.io.OutputStream outStream, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) throws java.io.IOException
outputEncoding
is defined, the output is written in that
character encoding, otherwise in the system default character encoding.java.io.IOException
public void classifyAndWriteAnswers(java.lang.String baseDir, java.lang.String filePattern, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) throws java.io.IOException
java.io.IOException
public void classifyFilesAndWriteAnswers(java.util.Collection<java.io.File> testFiles) throws java.io.IOException
java.io.IOException
public void classifyFilesAndWriteAnswers(java.util.Collection<java.io.File> testFiles, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) throws java.io.IOException
java.io.IOException
public void classifyAndWriteAnswers(java.util.Collection<java.util.List<IN>> documents, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) throws java.io.IOException
java.io.IOException
public void dumpFeatures(java.util.Collection<java.util.List<IN>> documents)
public void classifyAndWriteAnswers(java.util.Collection<java.util.List<IN>> documents, java.io.PrintWriter printWriter, DocumentReaderAndWriter<IN> readerWriter, boolean outputScores) throws java.io.IOException
java.io.IOException
public void classifyAndWriteAnswersKBest(java.lang.String testFile, int k, DocumentReaderAndWriter<IN> readerAndWriter) throws java.io.IOException
testFile
- The name of the file to test on.k
- How many best to printreaderAndWriter
- Class to be used for printing answersjava.io.IOException
public void classifyAndWriteAnswersKBest(ObjectBank<java.util.List<IN>> documents, int k, java.io.PrintWriter printWriter, DocumentReaderAndWriter<IN> readerAndWriter) throws java.io.IOException
documents
- The ObjectBank to test on.java.io.IOException
public void classifyAndWriteViterbiSearchGraph(java.lang.String testFile, java.lang.String searchGraphPrefix, DocumentReaderAndWriter<IN> readerAndWriter) throws java.io.IOException
testFile
- The file to test on.java.io.IOException
public void writeAnswers(java.util.List<IN> doc, java.io.PrintWriter printWriter, DocumentReaderAndWriter<IN> readerAndWriter) throws java.io.IOException
doc
- Documents to write outprintWriter
- Writer to use for outputjava.io.IOException
- If an IO problempublic boolean countResults(java.util.List<IN> doc, Counter<java.lang.String> entityTP, Counter<java.lang.String> entityFP, Counter<java.lang.String> entityFN)
public static boolean countResultsSegmenter(java.util.List<? extends CoreMap> doc, Counter<java.lang.String> entityTP, Counter<java.lang.String> entityFP, Counter<java.lang.String> entityFN)
public static void printResults(Counter<java.lang.String> entityTP, Counter<java.lang.String> entityFP, Counter<java.lang.String> entityFN)
public abstract void serializeClassifier(java.lang.String serializePath)
serializePath
- The path/filename to write the classifier to.public abstract void serializeClassifier(java.io.ObjectOutputStream oos)
public void loadClassifierNoExceptions(java.io.InputStream in, java.util.Properties props)
in
- The InputStream to read frompublic void loadClassifier(java.io.InputStream in) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
in
- The InputStream to load the serialized classifier fromjava.io.IOException
- If there are problems accessing the input streamjava.lang.ClassCastException
- If there are problems interpreting the serialized datajava.lang.ClassNotFoundException
- If there are problems interpreting the serialized datapublic void loadClassifier(java.io.InputStream in, java.util.Properties props) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
in
- The InputStream to load the serialized classifier fromprops
- This Properties object will be used to update the
SeqClassifierFlags which are read from the serialized classifierjava.io.IOException
- If there are problems accessing the input streamjava.lang.ClassCastException
- If there are problems interpreting the serialized datajava.lang.ClassNotFoundException
- If there are problems interpreting the serialized datapublic abstract void loadClassifier(java.io.ObjectInputStream in, java.util.Properties props) throws java.io.IOException, java.lang.ClassCastException, java.lang.ClassNotFoundException
in
- The InputStream to load the serialized classifier fromprops
- This Properties object will be used to update the
SeqClassifierFlags which are read from the serialized classifierjava.io.IOException
- If there are problems accessing the input streamjava.lang.ClassCastException
- If there are problems interpreting the serialized datajava.lang.ClassNotFoundException
- If there are problems interpreting the serialized datapublic void loadClassifier(java.lang.String loadPath) throws java.lang.ClassCastException, java.io.IOException, java.lang.ClassNotFoundException
java.lang.ClassCastException
java.io.IOException
java.lang.ClassNotFoundException
public void loadClassifier(java.lang.String loadPath, java.util.Properties props) throws java.lang.ClassCastException, java.io.IOException, java.lang.ClassNotFoundException
java.lang.ClassCastException
java.io.IOException
java.lang.ClassNotFoundException
public void loadClassifierNoExceptions(java.lang.String loadPath)
public void loadClassifierNoExceptions(java.lang.String loadPath, java.util.Properties props)
public void loadClassifier(java.io.File file) throws java.lang.ClassCastException, java.io.IOException, java.lang.ClassNotFoundException
java.lang.ClassCastException
java.io.IOException
java.lang.ClassNotFoundException
public void loadClassifier(java.io.File file, java.util.Properties props) throws java.lang.ClassCastException, java.io.IOException, java.lang.ClassNotFoundException
file
- Loads a classifier from this file.props
- Properties in this object will be used to overwrite those
specified in the serialized classifierjava.io.IOException
- If there are problems accessing the input streamjava.lang.ClassCastException
- If there are problems interpreting the serialized datajava.lang.ClassNotFoundException
- If there are problems interpreting the serialized datapublic void loadClassifierNoExceptions(java.io.File file)
public void loadClassifierNoExceptions(java.io.File file, java.util.Properties props)
public void loadJarClassifier(java.lang.String modelName, java.util.Properties props)
modelName
- The name of the model file. Iff it ends in .gz, then it is assumed
to be gzip compressed.props
- A Properties object which can override certain properties in the
serialized file, such as the DocumentReaderAndWriter. You can pass
in null
to override nothing.protected void printFeatures(IN wi, java.util.Collection<java.lang.String> features)
protected void printFeatureLists(IN wi, java.util.Collection<java.util.List<java.lang.String>> features)
public int windowSize()