Python: NLTK download corpus
Revision as of 15:34, 5 February 2017 by Onnowpurbo (talk | contribs)
Corpus untuk NLTK bisa di download menggunakan script, misalnya download-corpus.py
import nltk nltk.download()
jalankan
python download-corpus.py
akan keluar
NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar,
Packages:
  [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian)
  [ ] mwa_ppdb............ The monolingual word aligner (Sultan et al.
                           2015) subset of the Paraphrase Database.
  [ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder)
  [-] panlex_lite......... PanLex Lite Corpus
  [ ] pe08................ Cross-Framework and Cross-Domain Parser
                           Evaluation Shared Task
  [-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0
                           character properties in Perl
  [ ] porter_test......... Porter Stemmer Test Files
  [-] stopwords........... Stopwords Corpus
  [ ] vader_lexicon....... VADER Sentiment Lexicon
  [ ] wmt15_eval.......... Evaluation data from WMT15
Collections:
  [-] all-corpora......... All the corpora
  [-] all................. All packages
  [-] book................ Everything used in the NLTK Book
([*] marks installed packages; [-] marks out-of-date or corrupt packages)
Download which package (l=list; x=cancel)?
  Identifier>
Pilih
all
supaya tidak pusing, tapi ini akan memakan banyak bandwidth, akan keluar
   Downloading collection u'all'
      | 
      | Downloading package abc to /home/onno/nltk_data...
      |   Package abc is already up-to-date!
      | Downloading package alpino to /home/onno/nltk_data...
      |   Package alpino is already up-to-date!
      | Downloading package biocreative_ppi to
      |     /home/onno/nltk_data...
      |   Package biocreative_ppi is already up-to-date!
...
...
dst ...
Corpus NLTK aKan tersimpan di
~/nltk_data/
Lumayan besar ..