第3回の練習問題の解答例¶

問題1¶

"../DATA01/text02/JPN002.txt"において、母語話者による修正文が付いてる文を取り出し、元の文と修正文それぞれで語数、タイプ・トークン比を計算し出力しなさい。修正文が付いてる文とは*JPN002:の直後の%NTVの行に文があるものです。

In [1]:
import os
import re

f = open("../DATA01/text02/JPN002.txt","r")

t = f.read()

T = t.split("\n")

L = []
N = []

for i in T:
    obj_l = re.match("\*JPN\d\d\d:\t",i)
    obj_n = re.match("%NTV:\t",i)
    if obj_l:
        s =  re.sub("\*JPN\d\d\d:\t","",i)
        L.append(s)
    elif obj_n:
        s =  re.sub("%NTV:\t","",i)
        N.append(s)
In [2]:
from nltk import word_tokenize

TTR_L = []

for i in L:
    i = i.lower()
    w = word_tokenize(i)
    
    ttr = len(set(w))/len(w)
    TTR_L.append(ttr)
    
TTR_N = []

for i in N:
    i = i.lower()
    w = word_tokenize(i)
    
    ttr = len(set(w))/len(w)
    TTR_N.append(ttr)
In [3]:
sum(TTR_L)/len(TTR_L)
Out[3]:
0.9379331681005477
In [4]:
sum(TTR_N)/len(TTR_N)
Out[4]:
0.9703283480966055

問題2¶

"../DATA01/text02/に含まれる個々の学習者の作文において、総語数、基本的な前置詞(以下のリストPrep)の出現回数、人称代名詞(以下のリストPron)の出現回数を数え、$\frac{前置詞の出現回数}{総語数}$および$\frac{人称代名詞の出現回数}{総語数}$を算出しなさい。

Prep = ["in","on","at","of","with","for","from","by","before","after","about","near","until","as"]

Pron = ["I","you","he","she","it","we","they"]

In [5]:
import os

file_names = os.listdir("../DATA01/text02/")

R_TEXT = []

for file in file_names:
    f = open("../DATA01/text02/"+file,'r')
    text = f.read()
    R_TEXT.append(text)
    
S_TEXT = []

for r in R_TEXT:
    text = r.split("\n")
    S_TEXT.append(text)
    
T_TEXT = []

for text in S_TEXT:
    tmp = []
    for sent in text:
        obj = re.match(r"\*JPN\d\d\d:\t",sent)
        if obj:
            s = re.sub(r"\*JPN\d\d\d:\t","",sent)
            tmp.append(s)
    T_TEXT.append(" ".join(tmp))        
    
In [6]:
Prep = ["in","on","at","of","with","for","from","by","before","after","about","near","until","as"]
Pron = ["I","you","he","she","it","we","they"]


N_prep = []
N_pron = []
N_word = []

for i in T_TEXT:
    #i = i.lower()
    w = word_tokenize(i)
    n_w = len(w)
    n1 = 0
    n2 = 0
    for j in w:
        if j in Prep:
            n1 += 1
        elif j in Pron:
            n2 += 1
    N_prep.append(n1)
    N_pron.append(n2)
    N_word.append(n_w)
In [7]:
Word_prep = []
Word_pron = []

for i,j,k in zip(N_prep,N_pron,N_word):
    w_prep = i/k
    w_pron = j/k
    Word_prep.append(w_prep)
    Word_pron.append(w_pron)