Weka Incrementelles Clustering NullPointerException

SnUgEr · 7. April 2011

Hallo,

ich habe Weka verwendet und erhalte immer eine NullPointerException

Code:

ERROR [main] (CobwebClusterer.java:57) - Should normally never happen!
java.lang.NullPointerException
        at weka.core.Instances.initialize(Instances.java:194)
        at weka.core.Instances.<init>(Instances.java:178)
        at weka.clusterers.Cobweb$CNode.<init>(Cobweb.java:175)
        at weka.clusterers.Cobweb$CNode.addInstance(Cobweb.java:210)
        at weka.clusterers.Cobweb.updateClusterer(Cobweb.java:1010)
        at cobweb.CobwebClusterer.incCluster(CobwebClusterer.java:55)
        at cobweb.CobwebClustererMain.main(CobwebClustererMain.java:59)
java.lang.Exception: assignClusterNums: tree not built correctly!
        at weka.clusterers.Cobweb$CNode.assignClusterNums(Cobweb.java:631)
        at weka.clusterers.Cobweb$CNode.access$300(Cobweb.java:122)
        at weka.clusterers.Cobweb.determineNumberOfClusters(Cobweb.java:976)
        at weka.clusterers.Cobweb.updateFinished(Cobweb.java:928)
        at cobweb.CobwebClusterer.updateFinished(CobwebClusterer.java:61)
        at cobweb.CobwebClustererMain.main(CobwebClustererMain.java:60)

Mein Code sieht wie folgt aus:

Code:

public class CobwebClustererMain {

    public static void main(String[] args) {
        CobwebClusterer cobweb;
        DualHashBidiMap map;
        ExtractedDocument document;
        List<StringToken> token;

        //Setzte daten
        map = new DualHashBidiMap();
        map.put("hello", 0);
        map.put("world", 1);
        map.put("hell", 2);

        cobweb = new CobwebClusterer();
        cobweb.setAttributeMap(map);

        //1. Erzeuge Tabelle
        cobweb.createDataTable(2);
        //2. Füge Daten ein
        //Document 1
        token = new ArrayList<StringToken>();
        token.add(new StringToken(1, "hello"));
        token.add(new StringToken(1, "world"));
        document = new ExtractedDocument(""+0, token);
        cobweb.buildDataRows(document);
        //Document 2
        token = new ArrayList<StringToken>();
        token.add(new StringToken(1, "hello"));
        token.add(new StringToken(1, "hell"));
        document = new ExtractedDocument(""+1, token);
        cobweb.buildDataRows(document);
        cobweb.clusterAll();

        System.out.println("ready 1");


        //2. Update
        token = new ArrayList<StringToken>();
        token.add(new StringToken(1, "world"));
        token.add(new StringToken(1, "hell"));
        document = new ExtractedDocument(""+0, token);
        cobweb.incCluster(document); //<- Hier wird Exception geworfen
        cobweb.updateFinished();

        System.out.println("ready 2");
    }
}

Die Fasadeklasse:

Code:

public class CobwebClusterer {
    private static final Logger logger = Logger.getLogger(
            CobwebClusterer.class);

    /**
     * The cluster algorithm form the weka.
     */
    private Cobweb cobweb;
    /**
     * Contains a mapping from word string to position inside.
     */
    private transient DualHashBidiMap attributeMap;
    /**
     * The table of the current cluster algorithm.
     */
    private transient Instances table;
    

    public void incCluster(ExtractedDocument newDocument) {
        Instance instance;
        Map<String, Integer> usage;

        //get the usage of every word
        usage = this.countWordUsage(newDocument);

        //create the data line
        instance = this.createDataRow(usage);

        //add the new data line
        try {
            this.cobweb.updateClusterer(instance); //<- Hier wird Exception geworfen
        } catch(Exception ex) {
            logger.error("Should normally never happen!", ex);
        }
    }
    public void updateFinished() {
        this.cobweb.updateFinished();
    }
    public void createDataTable(int size) {
        table = this.createTable(size);
    }
    public void buildDataRows(ExtractedDocument newDocument) {
        Map<String, Integer> wordCount;
        Instance dataRow;

        //Count the words
        wordCount = this.countWordUsage(newDocument);
        //Create the data row
        dataRow = this.createDataRow(wordCount);
        //insert the data row into the table
        table.add(dataRow);
    }
    public void clusterAll() {
        this.cobweb = new Cobweb();
        try {
            this.cobweb.buildClusterer(table);
        } catch(Exception ex) {
            logger.error(ex);
        }
    }
    /**
     * Creates a table for clustering. It will create as much attributes as
     * available in the attributeMap. Please remember the name of the attribute
     * is equals to the attributeMap value. The value is only a number not the
     * word itself.
     * @param size The number of data rows.
     * @return The new created table.
     */
    private Instances createTable(int size) {
        Attribute attribute;
        Instances dataTable;
        ArrayList<Attribute> attributes;

        //create the list of attributes for the table
        attributes = new ArrayList<Attribute>(attributeMap.size());
        for(int i = 0; i < attributeMap.size(); i++) {
            //Create the attribute with the name
            attribute = new Attribute(i+"");
            //add it to the list
            attributes.add(attribute);
        }
        dataTable = new Instances("", attributes, size);


        return dataTable;
    }
    /**
     * Counts the usage of every word.
     * @param extractedDocument The document, which words should be counted.
     * @return The counted words. The key is the word and the value is the
     * corresponding usage of the word inside the text.
     */
    private Map<String, Integer> countWordUsage(
            ExtractedDocument extractedDocument) {
        Integer count;
        Map<String, Integer> stringCount;

        //2. copy the data into the clustering algorithm
        //create the result array for every document
        stringCount = new HashMap<String, Integer>();
        for(StringToken tokenIter : extractedDocument.getExtractedTokens()) {
            //count the usage of the string
            count = stringCount.get(tokenIter.getToken());
            if(count == null) {
                stringCount.put(tokenIter.getToken(), 1);
            } else {
                stringCount.put(tokenIter.getToken(), count+1);
            }
        }

        return stringCount;
    }
    /**
     * Creates a single data row.
     * @param wordUsage The data, for witch the row must be created.
     * @return The new created data row.
     */
    private Instance createDataRow(Map<String, Integer> wordUsage) {
        Integer pos;
        Instance instance;

        //Create the data row
        instance = new SparseInstance(this.attributeMap.size());

        //copy every word
        for(Entry<String, Integer> wordIter : wordUsage.entrySet()) {
            //get the position of the token
            pos = (Integer) attributeMap.get(wordIter.getKey());
            instance.setValueSparse(pos, wordIter.getValue());
        }
        for(Double iter : instance.toDoubleArray())
            System.out.print(iter + " ");
        System.out.println();

        return instance;
    }


    public DualHashBidiMap getAttributeMap() {
        return attributeMap;
    }

    public void setAttributeMap(DualHashBidiMap attributeMap) {
        this.attributeMap = attributeMap;
    }
}

Hat jemand eine Lösung? Ich suche schon seit fast 3 Stunden.
Problem ist eben es wird in buildClusterer() und incCluster() jeweils die selben internen Methoden verwendet. Nur mit dem Unterschied das zum Schluss die Daten in die Tabelle (buildClusterer()) oder direkt in den Cluster-Algorithmus (incCluster()) eingefügt werden.

SnUgEr · 7. April 2011

Hi,

das Problem taucht sogar in einem kleineren Beispiel auf:

Code:

    public static void main(String[] args) throws Exception {
        Cobweb cobweb;
        Instances instances;
        Instance instance;
        ArrayList<Attribute> attributes;
        ArrayList<String> types;

        attributes = new ArrayList<Attribute>();

        //Da nominale Daten benötigt werden
        types = new ArrayList<String>();
        types.add("false");
        types.add("yes");
        attributes.add(new Attribute("hello", types));
        types = new ArrayList<String>();
        types.add("false");
        types.add("yes");
        attributes.add(new Attribute("world", types));
        types = new ArrayList<String>();
        types.add("false");
        types.add("yes");
        attributes.add(new Attribute("hell", types));


        instances = new Instances("test", attributes, 3);

        cobweb = new Cobweb();
        cobweb.buildClusterer(instances);


        instance = new DenseInstance(3);
        instance.setValue(0, 1);
        instance.setValue(1, 1);
        instance.setValue(2, 0);
        cobweb.updateClusterer(instance); //<- Exception

        instance = new DenseInstance(3);
        instance.setValue(0, 1);
        instance.setValue(1, 0);
        instance.setValue(2, 1);
        cobweb.updateClusterer(instance);

        cobweb.updateFinished();

Weka Incrementelles Clustering NullPointerException

SnUgEr

Grünschnabel

SnUgEr

Grünschnabel

Neue Beiträge