WordNet BFS Shortest Path

WordNet is a graph which represents a set of words, 
Synset can be interpreted as a set of words, which have same/similar meaning, 

For example AND_circuit AND_gate, have similar meaning "a circuit in a computer that fires only when all of its inputs fire"

The Hypernym of a synset is a more general form of this synset, for example, "gate" is the hypernym of synset "AND_circuit AND_gate" and "OR_circuit OR_gate"

(from priceton wordnet digraph)

To find the distance between two synsets, we can use BFS, 
First of all we need to build graph, after that we can run BFS
1. run BFS on synset1, Complexity (V+E), we can get distance from synset1 to each synset
2. run BFS on synset2, Complexity (V+E), we can get distance from synset2 to each synset
3. Iterate through all synset and calculate the minimum of distance_to_synset1 + distance_to_synset2
Complexity (V)

Thus the total complexity is (V+E)

In reality a word may have multiple meanings, thus a word may have multiple synsets

For example:

The definition of bed
1. (geology) a stratum of rock (especially sedimentary rock); "they found a bed of sandstone" 
2. a foundation of earth or rock supporting a road or railroad track; "the track bed had washed away"
3. a piece of furniture that provides a place to sleep; "he sat on the edge of the bed"; "the room had only a bed and chair" 

So, the BFS will start from multiple sources, each of which has different meaning. It will be applied for both synset1 and synset2


(V is vertex: all the synsets; E is edge, all the connections between synsets)
One Application The usage of wordNet
we can use it to find the word in a set of words, which is least related to other words in the set, by calculating the sum of distance of each word with other words in the set, and find the maximum.




评论

此博客中的热门博文

Embedded System interview Question

MicroKernel & Exokernel 操作系统未来可能的发展

中国城市房地产分析