Text Analyzing using NLTK commands
NLTK programming forms integral part of text analyzing.
Steps are:
a) On python, use (pip install nltk)
b) Then, import texts using command given below:
>>> from nltk.book import *
Output:
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
Some of the text analyzing commands are:
>>> set(text1) //prints all content in text as individual words in as given format//
Output: , {'ash', 'islands', 'extinct', 'Ordinaire', 'Terrible', 'mutually', 'lengthen', 'since', 'nick', 'stature', 'DISSECTION', 'boat', 'Unconsciously', 'contenting', 'Plum', 'Humane', 'membranes', 'necessitated', 'Dorchester', 'Unappalled', 'sufficiently', 'invunerable', 'touchy', 'Bad',........
>>>sorted(text1) //prints all content in text as individual words in alphabetical format//
Output:{, 'quiver', 'quivered', 'quivering', 'quivers', 'quoggy', 'quohogs', 'quoin', 'quoins', 'quote', 'quoted', 'raal', 'rabble', 'rabid', 'race', 'raced', 'races', 'racing', 'rack', 'racket', 'radiance', 'radiant', 'radiates', 'radiating', 'radical', 'rafted', 'rafters', 'rafts', 'rag',......
>>> len(text1) //gives total words in text1//
260819
>>> len(set(text1)) //gives total words in sets in text1//
19317
>>> text1.collocations() //gives words which appear as group of two-words//
Sperm Whale; Moby Dick; White Whale; old man; Captain Ahab; sperm
whale; Right Whale; Captain Peleg; New Bedford; Cape Horn; cried Ahab;
years ago; lower jaw; never mind; Father Mapple; cried Stubb; chief
mate; white whale; ivory leg; one hand
>>> text2.count('wrong') //counts repetition of word in whole text//
22
>>> text2.concordance('right') //shows location of occurance of word in line from text//
Displaying 25 of 32 matches:
ttendants . No one could dispute her right to come ; the house was her husband
ded at the time . Had he been in his right senses , he could not have thought o
own expenses ." " I believe you are right , my love ; it will be better that t
wood , " I believe you are perfectly right . My father certainly could mean not
hich in general direct him perfectly right ." Marianne was afraid of offending
mmonly moderate , as to leave her no right of objection on either point ; and ,
one else . Every thing he did , was right . Every thing he said , was clever .
our conjectures may be , you have no right to repeat them ." " I never had any
s . Mrs . Jennings sat on Elinor ' s right hand ; and they had not been long se
>>> text2.similar('good') //displays close words in text//
large short long young great much comfortable kind quiet pretty
charming in respectable as to house one that time thing
>>> text4.index('the') //to find location of word 'the' in the list of words//
4
>>> text5[:50] // to print first 50 words in the text5//
['now', 'im', 'left', 'with', 'this', 'gay', 'name', ':P', 'PART', 'hey', 'everyone', 'ah', 'well', 'NICK', ':', 'U7', 'U7', 'is', 'a', 'gay', 'name', '.', '.', 'ACTION', 'gives', 'U121', 'a', 'golf', 'clap', '.', ':)', 'JOIN', 'hi', 'U59', '26', '/', 'm', '/', 'ky', 'women', 'that', 'are', 'nice', 'please', 'pm', 'me', 'JOIN', 'PART', 'there', 'ya']
>>> text5[1200:1268] // printing words from index No. 1200 to 1268//
['U116', 'PART', 'U7', 'PART', 'there', 'is', 'not', '!', 'heyy', 'U148', 'i', 'hate', 'you', '.', 'boys', 'are', 'naughtier', 'U92', '.', 'JOIN', 'bye', 'U148', 'Hmm', 'you', 'I', 'hate', 'you', 'say', '..', 'Guess', 'what', 'PART', 'i', 'hate', 'you', 'U121', 'fuck', 'your', 'ugly', 'JOIN', 'if', 'i', 'had', 'a', 'daughter', 'she', 'would', 'regret', 'me', 'bein', 'her', 'dad', 'huh', '?', 'Hmm', 'PART', 'What', '?', 'aw', 'U115', 'whys', 'that', 'deep', 'inside', 'U121', 'wants', 'what', 'she']
>>> text6[-60:-20] // printing words from end of text from index No. -60 to -20//
['s', 'an', 'offensive', 'weapon', ',', 'that', 'is', '.', 'OFFICER', '#', '2', ':', 'Come', 'on', '.', 'Back', 'with', "'", 'em', '.', 'Back', '.', 'Right', '.', 'Come', 'along', '.', 'INSPECTOR', ':', 'Everything', '?', '[', 'squeak', ']', 'OFFICER', '#', '1', ':', 'All', 'right']
>>> text7[1234] //word with index No.1234 in text7//
'the'
>>> text9[13900]
'.'
>>> ' '.join(['Raj', 'Krish', 'Arnie', 'Suze']) //joining group of words//
'Raj Krish Arnie Suze'
>>> 'All that goes well ends well'.split() //spliting line into group of words//
['All', 'that', 'goes', 'well', 'ends', 'well']
>>> 'Are'+' '+'you'+' '+'feeling'+' '+'well'+'?' //joining group of words//
'Are you feeling well?'
NLTK programming forms integral part of text analyzing.
Steps are:
a) On python, use (pip install nltk)
b) Then, import texts using command given below:
>>> from nltk.book import *
Output:
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
Some of the text analyzing commands are:
>>> set(text1) //prints all content in text as individual words in as given format//
Output: , {'ash', 'islands', 'extinct', 'Ordinaire', 'Terrible', 'mutually', 'lengthen', 'since', 'nick', 'stature', 'DISSECTION', 'boat', 'Unconsciously', 'contenting', 'Plum', 'Humane', 'membranes', 'necessitated', 'Dorchester', 'Unappalled', 'sufficiently', 'invunerable', 'touchy', 'Bad',........
>>>sorted(text1) //prints all content in text as individual words in alphabetical format//
Output:{, 'quiver', 'quivered', 'quivering', 'quivers', 'quoggy', 'quohogs', 'quoin', 'quoins', 'quote', 'quoted', 'raal', 'rabble', 'rabid', 'race', 'raced', 'races', 'racing', 'rack', 'racket', 'radiance', 'radiant', 'radiates', 'radiating', 'radical', 'rafted', 'rafters', 'rafts', 'rag',......
>>> len(text1) //gives total words in text1//
260819
>>> len(set(text1)) //gives total words in sets in text1//
19317
>>> text1.collocations() //gives words which appear as group of two-words//
Sperm Whale; Moby Dick; White Whale; old man; Captain Ahab; sperm
whale; Right Whale; Captain Peleg; New Bedford; Cape Horn; cried Ahab;
years ago; lower jaw; never mind; Father Mapple; cried Stubb; chief
mate; white whale; ivory leg; one hand
>>> text2.count('wrong') //counts repetition of word in whole text//
22
>>> text2.concordance('right') //shows location of occurance of word in line from text//
Displaying 25 of 32 matches:
ttendants . No one could dispute her right to come ; the house was her husband
ded at the time . Had he been in his right senses , he could not have thought o
own expenses ." " I believe you are right , my love ; it will be better that t
wood , " I believe you are perfectly right . My father certainly could mean not
hich in general direct him perfectly right ." Marianne was afraid of offending
mmonly moderate , as to leave her no right of objection on either point ; and ,
one else . Every thing he did , was right . Every thing he said , was clever .
our conjectures may be , you have no right to repeat them ." " I never had any
s . Mrs . Jennings sat on Elinor ' s right hand ; and they had not been long se
>>> text2.similar('good') //displays close words in text//
large short long young great much comfortable kind quiet pretty
charming in respectable as to house one that time thing
>>> text4.index('the') //to find location of word 'the' in the list of words//
4
>>> text5[:50] // to print first 50 words in the text5//
['now', 'im', 'left', 'with', 'this', 'gay', 'name', ':P', 'PART', 'hey', 'everyone', 'ah', 'well', 'NICK', ':', 'U7', 'U7', 'is', 'a', 'gay', 'name', '.', '.', 'ACTION', 'gives', 'U121', 'a', 'golf', 'clap', '.', ':)', 'JOIN', 'hi', 'U59', '26', '/', 'm', '/', 'ky', 'women', 'that', 'are', 'nice', 'please', 'pm', 'me', 'JOIN', 'PART', 'there', 'ya']
>>> text5[1200:1268] // printing words from index No. 1200 to 1268//
['U116', 'PART', 'U7', 'PART', 'there', 'is', 'not', '!', 'heyy', 'U148', 'i', 'hate', 'you', '.', 'boys', 'are', 'naughtier', 'U92', '.', 'JOIN', 'bye', 'U148', 'Hmm', 'you', 'I', 'hate', 'you', 'say', '..', 'Guess', 'what', 'PART', 'i', 'hate', 'you', 'U121', 'fuck', 'your', 'ugly', 'JOIN', 'if', 'i', 'had', 'a', 'daughter', 'she', 'would', 'regret', 'me', 'bein', 'her', 'dad', 'huh', '?', 'Hmm', 'PART', 'What', '?', 'aw', 'U115', 'whys', 'that', 'deep', 'inside', 'U121', 'wants', 'what', 'she']
>>> text6[-60:-20] // printing words from end of text from index No. -60 to -20//
['s', 'an', 'offensive', 'weapon', ',', 'that', 'is', '.', 'OFFICER', '#', '2', ':', 'Come', 'on', '.', 'Back', 'with', "'", 'em', '.', 'Back', '.', 'Right', '.', 'Come', 'along', '.', 'INSPECTOR', ':', 'Everything', '?', '[', 'squeak', ']', 'OFFICER', '#', '1', ':', 'All', 'right']
>>> text7[1234] //word with index No.1234 in text7//
'the'
>>> text9[13900]
'.'
>>> ' '.join(['Raj', 'Krish', 'Arnie', 'Suze']) //joining group of words//
'Raj Krish Arnie Suze'
>>> 'All that goes well ends well'.split() //spliting line into group of words//
['All', 'that', 'goes', 'well', 'ends', 'well']
>>> 'Are'+' '+'you'+' '+'feeling'+' '+'well'+'?' //joining group of words//
'Are you feeling well?'
No comments:
Post a Comment