第3回の練習問題の解答例¶
問題1¶
"../DATA01/text02/JPN002.txt"
において、母語話者による修正文が付いてる文を取り出し、元の文と修正文それぞれで語数、タイプ・トークン比を計算し出力しなさい。修正文が付いてる文とは*JPN002:
の直後の%NTV
の行に文があるものです。
In [1]:
import os
import re
f = open("../DATA01/text02/JPN002.txt","r")
t = f.read()
T = t.split("\n")
L = []
N = []
for i in T:
obj_l = re.match("\*JPN\d\d\d:\t",i)
obj_n = re.match("%NTV:\t",i)
if obj_l:
s = re.sub("\*JPN\d\d\d:\t","",i)
L.append(s)
elif obj_n:
s = re.sub("%NTV:\t","",i)
N.append(s)
In [2]:
from nltk import word_tokenize
TTR_L = []
for i in L:
i = i.lower()
w = word_tokenize(i)
ttr = len(set(w))/len(w)
TTR_L.append(ttr)
TTR_N = []
for i in N:
i = i.lower()
w = word_tokenize(i)
ttr = len(set(w))/len(w)
TTR_N.append(ttr)
In [3]:
sum(TTR_L)/len(TTR_L)
Out[3]:
0.9379331681005477
In [4]:
sum(TTR_N)/len(TTR_N)
Out[4]:
0.9703283480966055
問題2¶
"../DATA01/text02/
に含まれる個々の学習者の作文において、総語数、基本的な前置詞(以下のリストPrep
)の出現回数、人称代名詞(以下のリストPron
)の出現回数を数え、$\frac{前置詞の出現回数}{総語数}$および$\frac{人称代名詞の出現回数}{総語数}$を算出しなさい。
Prep = ["in","on","at","of","with","for","from","by","before","after","about","near","until","as"]
Pron = ["I","you","he","she","it","we","they"]
In [5]:
import os
file_names = os.listdir("../DATA01/text02/")
R_TEXT = []
for file in file_names:
f = open("../DATA01/text02/"+file,'r')
text = f.read()
R_TEXT.append(text)
S_TEXT = []
for r in R_TEXT:
text = r.split("\n")
S_TEXT.append(text)
T_TEXT = []
for text in S_TEXT:
tmp = []
for sent in text:
obj = re.match(r"\*JPN\d\d\d:\t",sent)
if obj:
s = re.sub(r"\*JPN\d\d\d:\t","",sent)
tmp.append(s)
T_TEXT.append(" ".join(tmp))
In [6]:
Prep = ["in","on","at","of","with","for","from","by","before","after","about","near","until","as"]
Pron = ["I","you","he","she","it","we","they"]
N_prep = []
N_pron = []
N_word = []
for i in T_TEXT:
#i = i.lower()
w = word_tokenize(i)
n_w = len(w)
n1 = 0
n2 = 0
for j in w:
if j in Prep:
n1 += 1
elif j in Pron:
n2 += 1
N_prep.append(n1)
N_pron.append(n2)
N_word.append(n_w)
In [7]:
Word_prep = []
Word_pron = []
for i,j,k in zip(N_prep,N_pron,N_word):
w_prep = i/k
w_pron = j/k
Word_prep.append(w_prep)
Word_pron.append(w_pron)