Hallo,
ich habe Weka verwendet und erhalte immer eine NullPointerException
Mein Code sieht wie folgt aus:
Die Fasadeklasse:
Hat jemand eine Lösung? Ich suche schon seit fast 3 Stunden.
Problem ist eben es wird in buildClusterer() und incCluster() jeweils die selben internen Methoden verwendet. Nur mit dem Unterschied das zum Schluss die Daten in die Tabelle (buildClusterer()) oder direkt in den Cluster-Algorithmus (incCluster()) eingefügt werden.
ich habe Weka verwendet und erhalte immer eine NullPointerException
Code:
ERROR [main] (CobwebClusterer.java:57) - Should normally never happen!
java.lang.NullPointerException
at weka.core.Instances.initialize(Instances.java:194)
at weka.core.Instances.<init>(Instances.java:178)
at weka.clusterers.Cobweb$CNode.<init>(Cobweb.java:175)
at weka.clusterers.Cobweb$CNode.addInstance(Cobweb.java:210)
at weka.clusterers.Cobweb.updateClusterer(Cobweb.java:1010)
at cobweb.CobwebClusterer.incCluster(CobwebClusterer.java:55)
at cobweb.CobwebClustererMain.main(CobwebClustererMain.java:59)
java.lang.Exception: assignClusterNums: tree not built correctly!
at weka.clusterers.Cobweb$CNode.assignClusterNums(Cobweb.java:631)
at weka.clusterers.Cobweb$CNode.access$300(Cobweb.java:122)
at weka.clusterers.Cobweb.determineNumberOfClusters(Cobweb.java:976)
at weka.clusterers.Cobweb.updateFinished(Cobweb.java:928)
at cobweb.CobwebClusterer.updateFinished(CobwebClusterer.java:61)
at cobweb.CobwebClustererMain.main(CobwebClustererMain.java:60)
Mein Code sieht wie folgt aus:
Code:
public class CobwebClustererMain {
public static void main(String[] args) {
CobwebClusterer cobweb;
DualHashBidiMap map;
ExtractedDocument document;
List<StringToken> token;
//Setzte daten
map = new DualHashBidiMap();
map.put("hello", 0);
map.put("world", 1);
map.put("hell", 2);
cobweb = new CobwebClusterer();
cobweb.setAttributeMap(map);
//1. Erzeuge Tabelle
cobweb.createDataTable(2);
//2. Füge Daten ein
//Document 1
token = new ArrayList<StringToken>();
token.add(new StringToken(1, "hello"));
token.add(new StringToken(1, "world"));
document = new ExtractedDocument(""+0, token);
cobweb.buildDataRows(document);
//Document 2
token = new ArrayList<StringToken>();
token.add(new StringToken(1, "hello"));
token.add(new StringToken(1, "hell"));
document = new ExtractedDocument(""+1, token);
cobweb.buildDataRows(document);
cobweb.clusterAll();
System.out.println("ready 1");
//2. Update
token = new ArrayList<StringToken>();
token.add(new StringToken(1, "world"));
token.add(new StringToken(1, "hell"));
document = new ExtractedDocument(""+0, token);
cobweb.incCluster(document); //<- Hier wird Exception geworfen
cobweb.updateFinished();
System.out.println("ready 2");
}
}
Die Fasadeklasse:
Code:
public class CobwebClusterer {
private static final Logger logger = Logger.getLogger(
CobwebClusterer.class);
/**
* The cluster algorithm form the weka.
*/
private Cobweb cobweb;
/**
* Contains a mapping from word string to position inside.
*/
private transient DualHashBidiMap attributeMap;
/**
* The table of the current cluster algorithm.
*/
private transient Instances table;
public void incCluster(ExtractedDocument newDocument) {
Instance instance;
Map<String, Integer> usage;
//get the usage of every word
usage = this.countWordUsage(newDocument);
//create the data line
instance = this.createDataRow(usage);
//add the new data line
try {
this.cobweb.updateClusterer(instance); //<- Hier wird Exception geworfen
} catch(Exception ex) {
logger.error("Should normally never happen!", ex);
}
}
public void updateFinished() {
this.cobweb.updateFinished();
}
public void createDataTable(int size) {
table = this.createTable(size);
}
public void buildDataRows(ExtractedDocument newDocument) {
Map<String, Integer> wordCount;
Instance dataRow;
//Count the words
wordCount = this.countWordUsage(newDocument);
//Create the data row
dataRow = this.createDataRow(wordCount);
//insert the data row into the table
table.add(dataRow);
}
public void clusterAll() {
this.cobweb = new Cobweb();
try {
this.cobweb.buildClusterer(table);
} catch(Exception ex) {
logger.error(ex);
}
}
/**
* Creates a table for clustering. It will create as much attributes as
* available in the attributeMap. Please remember the name of the attribute
* is equals to the attributeMap value. The value is only a number not the
* word itself.
* @param size The number of data rows.
* @return The new created table.
*/
private Instances createTable(int size) {
Attribute attribute;
Instances dataTable;
ArrayList<Attribute> attributes;
//create the list of attributes for the table
attributes = new ArrayList<Attribute>(attributeMap.size());
for(int i = 0; i < attributeMap.size(); i++) {
//Create the attribute with the name
attribute = new Attribute(i+"");
//add it to the list
attributes.add(attribute);
}
dataTable = new Instances("", attributes, size);
return dataTable;
}
/**
* Counts the usage of every word.
* @param extractedDocument The document, which words should be counted.
* @return The counted words. The key is the word and the value is the
* corresponding usage of the word inside the text.
*/
private Map<String, Integer> countWordUsage(
ExtractedDocument extractedDocument) {
Integer count;
Map<String, Integer> stringCount;
//2. copy the data into the clustering algorithm
//create the result array for every document
stringCount = new HashMap<String, Integer>();
for(StringToken tokenIter : extractedDocument.getExtractedTokens()) {
//count the usage of the string
count = stringCount.get(tokenIter.getToken());
if(count == null) {
stringCount.put(tokenIter.getToken(), 1);
} else {
stringCount.put(tokenIter.getToken(), count+1);
}
}
return stringCount;
}
/**
* Creates a single data row.
* @param wordUsage The data, for witch the row must be created.
* @return The new created data row.
*/
private Instance createDataRow(Map<String, Integer> wordUsage) {
Integer pos;
Instance instance;
//Create the data row
instance = new SparseInstance(this.attributeMap.size());
//copy every word
for(Entry<String, Integer> wordIter : wordUsage.entrySet()) {
//get the position of the token
pos = (Integer) attributeMap.get(wordIter.getKey());
instance.setValueSparse(pos, wordIter.getValue());
}
for(Double iter : instance.toDoubleArray())
System.out.print(iter + " ");
System.out.println();
return instance;
}
public DualHashBidiMap getAttributeMap() {
return attributeMap;
}
public void setAttributeMap(DualHashBidiMap attributeMap) {
this.attributeMap = attributeMap;
}
}
Hat jemand eine Lösung? Ich suche schon seit fast 3 Stunden.
Problem ist eben es wird in buildClusterer() und incCluster() jeweils die selben internen Methoden verwendet. Nur mit dem Unterschied das zum Schluss die Daten in die Tabelle (buildClusterer()) oder direkt in den Cluster-Algorithmus (incCluster()) eingefügt werden.