개발자가 되자

Python 토큰화

2019. 6. 24. 20:44ㆍPython/머신러닝,딥러닝,데이터분석,파이썬

NLTK와 KoNLPy를 이용한 영어, 한국어 토큰화 실습

nltk.download()를빼고하니 오류가 있어서 download 를 한후 진행되니 정상실행되었다.

In [18]:

import nltk
from nltk.tokenize import word_tokenize
nltk.download()
text="I am actively looking for Ph.D. students. and you are a Ph.D. student."
print(word_tokenize(text))

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

['I', 'am', 'actively', 'looking', 'for', 'Ph.D.', 'students', '.', 'and', 'you', 'are', 'a', 'Ph.D.', 'student', '.']

In [19]:

from nltk.tag import pos_tag
x=word_tokenize(text)
pos_tag(x)

Out[19]:

[('I', 'PRP'),
 ('am', 'VBP'),
 ('actively', 'RB'),
 ('looking', 'VBG'),
 ('for', 'IN'),
 ('Ph.D.', 'NNP'),
 ('students', 'NNS'),
 ('.', '.'),
 ('and', 'CC'),
 ('you', 'PRP'),
 ('are', 'VBP'),
 ('a', 'DT'),
 ('Ph.D.', 'NNP'),
 ('student', 'NN'),
 ('.', '.')]

다음과 같이 형태소 분석된 결과를 확인 할 수 있다

In [30]:

'Python > 머신러닝,딥러닝,데이터분석,파이썬' 카테고리의 다른 글

Python 시작 - 자연어 처리를 하기 위한 환경 설치 (0)	2019.06.24

관련글

Python 시작 - 자연어 처리를 하기 위한 환경 설치 2019.06.24

티스토리툴바