Difference between revisions of "Python: NLTK download corpus"
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
	
Onnowpurbo (talk | contribs)  | 
				Onnowpurbo (talk | contribs)   | 
				||
| Line 46: | Line 46: | ||
  all  |   all  | ||
| − | supaya tidak pusing, tapi ini akan memakan banyak bandwidth  | + | supaya tidak pusing, tapi ini akan memakan banyak bandwidth,  | 
| + | akan keluar  | ||
| + | |||
| + |     Downloading collection u'all'  | ||
| + |        |   | ||
| + |        | Downloading package abc to /home/onno/nltk_data...  | ||
| + |        |   Package abc is already up-to-date!  | ||
| + |        | Downloading package alpino to /home/onno/nltk_data...  | ||
| + |        |   Package alpino is already up-to-date!  | ||
| + |        | Downloading package biocreative_ppi to  | ||
| + |        |     /home/onno/nltk_data...  | ||
| + |        |   Package biocreative_ppi is already up-to-date!  | ||
| + |  ...  | ||
| + |  ...  | ||
| + |  dst ...  | ||
| + | |||
| + | |||
| + | |||
| + | |||
Revision as of 05:18, 2 February 2017
Corpus untuk NLTK bisa di download menggunakan script, misalnya download-corpus.py
import nltk nltk.download()
jalankan
python download-corpus.py
akan keluar
NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar,
Packages:
  [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian)
  [ ] mwa_ppdb............ The monolingual word aligner (Sultan et al.
                           2015) subset of the Paraphrase Database.
  [ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder)
  [-] panlex_lite......... PanLex Lite Corpus
  [ ] pe08................ Cross-Framework and Cross-Domain Parser
                           Evaluation Shared Task
  [-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0
                           character properties in Perl
  [ ] porter_test......... Porter Stemmer Test Files
  [-] stopwords........... Stopwords Corpus
  [ ] vader_lexicon....... VADER Sentiment Lexicon
  [ ] wmt15_eval.......... Evaluation data from WMT15
Collections:
  [-] all-corpora......... All the corpora
  [-] all................. All packages
  [-] book................ Everything used in the NLTK Book
([*] marks installed packages; [-] marks out-of-date or corrupt packages)
Download which package (l=list; x=cancel)?
  Identifier>
Pilih
all
supaya tidak pusing, tapi ini akan memakan banyak bandwidth, akan keluar
   Downloading collection u'all'
      | 
      | Downloading package abc to /home/onno/nltk_data...
      |   Package abc is already up-to-date!
      | Downloading package alpino to /home/onno/nltk_data...
      |   Package alpino is already up-to-date!
      | Downloading package biocreative_ppi to
      |     /home/onno/nltk_data...
      |   Package biocreative_ppi is already up-to-date!
...
...
dst ...
AKan tersimpan di
~/nltk_data/
Lumayan besar ..