9 文章における複雑性¶
9.1 単語の種類数が増えることに関する複雑性¶
- これまでは、文(単文)における複雑性について検討してきました。
- 文における複雑性では、たとえば、単語の種類数が増えることに関する複雑性はほとんど意味を持ちませんでした。
- 以下の2つ文における単語の種類数が増えることに関する複雑性を比較すると、多くの場合、単語数とほぼ同じになります。
- The new president is set to be inaugurated on Sunday, but the current president is refusing to step down.
- Forty-five years ago a puma was captured in the Highlands, but reported big cat sightings continued.
- 一方で、以下のような2つの文章(Aが母語話者、Bが学習者)を比較する場合には、限定的ですが、単語の種類数が増えることに関する複雑性の違いに意味があります。
A
English language education in Japan has a reputation for being unsuccessful. Although there are many reasons for this, related both to the type of education received in school and to cultural factors, in this essay we will focus on the teaching methods used, the ability and attitudes of the teachers themselves, and the high school and university exams which determine the course curriculum. First, it is common for teachers in Japan to teach English in the way that they were taught ? using the so-called grammar-translation method. This method, heavily criticized by researchers and educational experts throughout the world, involves reading sentences in the foreign language (L2), analyzing them grammatically, and then translating them into the students' native language (L1). The reverse is also common. This method turns the classroom into a venue for decoding and problem solving, and virtually ignores the ideas of language as a tool for communication and a product of culture. Another problem with exercises like these is that they build the habit of attempting to find a one-to-one correspondence between grammar and vocabulary in the two languages. Language, of course, is not so simple. This approach reinforces the idea that there is only one correct answer, making language appear like mathematics. Furthermore, it ignores differences in culture which determine the suitability of specific words and phrases in a given context. For the majority of students, the grammar-translation method is not engaging, and is even downright boring. Since motivation is a key factor in language acquisition, a teaching method which undermines that motivation can hardly be desirable. Next, many teachers do not actually use English in the classroom. One of the reasons for this is confidence. Although many younger teachers today have spent time learning English in an English-speaking country, it is still not uncommon to meet a Japanese teacher of English who has never left Japan. Though oftentimes they have studied the language diligently, most of them lack the confidence to use it, or fear making mistakes in front of their students. As a result, students have few role models for speaking English. Some teachers even confess to their students (as if bragging) that they cannot speak English at all, or treat the ability to speak English with disdain. This kind of attitude surely must have a negative influence on students. The last major reason we will discuss here is the entrance exam system. Most schools center their curriculum on preparation for high school or university entrance exams. Traditionally these exams only incorporated reading and writing. Recently, a listening element was added; however speaking is still not a part of the testing. As a result, the English classroom in Japan still focuses little on developing conversational skills. Students spend their time, for example, cramming vocabulary from test preparation books without ever seeing it in a real context. Because the studying done in these cramming periods tends to be poorly reinforced, much of the memorized material is lost soon after the exams finish. In the mid 1980's the Japanese government began to take steps to improve English language education in junior and senior high schools by putting native English speaking teachers in the classroom alongside the Japanese teachers. Although this was a commendable step, real improvement in English education will require the Japanese teachers themselves to reexamine their teaching methods and attitudes, and for the high school and university entrance exams to include a conversational element.
B
In Japan, education system is changing fast now. The content of education is reduced and students come to have free time more. Furthermore, 'total education time' is taken in all Japanese junior high school. I think this change is bad and Japanese government must change it to original form rapidly for the following reasons. Firstly, many young people this time cannot read or write basic words (Japanese 'kanji.') And, they cannot calculate, too. These things are need in daily life, even if they don't go to college or university. Originally, Japanese student got better score in reading and calculation than any other country's student few decades ago. For, reading, writing, and calculation were very important in Japanese society. Now, however, this good value in old Japan is being reduced. This is very large problem in Japan. Secondly, there is deep gap between the level of high school education and university education. Many students who don't learn the content of high school education cannot catch up with the class in universities. Furthermore, for example, I am medical student, but I don't learn biology in high school. And there are many students like me. In addition, the care of university to us is nearly nothing. So, the level of the study in technology, medicine and so is going down. This is very large problem in Japan, too. Thirdly, as the content of school education is reduced, at the same time, the curiosity of students seems reduced. The new idea and new device are coming from the curiosity, I think. So, the reduction of it means the down of possibility that the evolutional change in various field will happen. This is very large problem in Japan. In conclusion, there are problems like these in Japan, because of the reduction of basic education. Luckily, the Japanese government is planning to change the education system. I hope this change will be going back to old Japanese school education system.
textA = "English language education in Japan has a reputation for being unsuccessful. Although there are many reasons for this, related both to the type of education received in school and to cultural factors, in this essay we will focus on the teaching methods used, the ability and attitudes of the teachers themselves, and the high school and university exams which determine the course curriculum. First, it is common for teachers in Japan to teach English in the way that they were taught ? using the so-called grammar-translation method. This method, heavily criticized by researchers and educational experts throughout the world, involves reading sentences in the foreign language (L2), analyzing them grammatically, and then translating them into the students' native language (L1). The reverse is also common. This method turns the classroom into a venue for decoding and problem solving, and virtually ignores the ideas of language as a tool for communication and a product of culture. Another problem with exercises like these is that they build the habit of attempting to find a one-to-one correspondence between grammar and vocabulary in the two languages. Language, of course, is not so simple. This approach reinforces the idea that there is only one correct answer, making language appear like mathematics. Furthermore, it ignores differences in culture which determine the suitability of specific words and phrases in a given context. For the majority of students, the grammar-translation method is not engaging, and is even downright boring. Since motivation is a key factor in language acquisition, a teaching method which undermines that motivation can hardly be desirable. Next, many teachers do not actually use English in the classroom. One of the reasons for this is confidence. Although many younger teachers today have spent time learning English in an English-speaking country, it is still not uncommon to meet a Japanese teacher of English who has never left Japan. Though oftentimes they have studied the language diligently, most of them lack the confidence to use it, or fear making mistakes in front of their students. As a result, students have few role models for speaking English. Some teachers even confess to their students (as if bragging) that they cannot speak English at all, or treat the ability to speak English with disdain. This kind of attitude surely must have a negative influence on students. The last major reason we will discuss here is the entrance exam system. Most schools center their curriculum on preparation for high school or university entrance exams. Traditionally these exams only incorporated reading and writing. Recently, a listening element was added; however speaking is still not a part of the testing. As a result, the English classroom in Japan still focuses little on developing conversational skills. Students spend their time, for example, cramming vocabulary from test preparation books without ever seeing it in a real context. Because the studying done in these cramming periods tends to be poorly reinforced, much of the memorized material is lost soon after the exams finish. In the mid 1980's the Japanese government began to take steps to improve English language education in junior and senior high schools by putting native English speaking teachers in the classroom alongside the Japanese teachers. Although this was a commendable step, real improvement in English education will require the Japanese teachers themselves to reexamine their teaching methods and attitudes, and for the high school and university entrance exams to include a conversational element."
textB ="In Japan, education system is changing fast now. The content of education is reduced and students come to have free time more. Furthermore, 'total education time' is taken in all Japanese junior high school. I think this change is bad and Japanese government must change it to original form rapidly for the following reasons. Firstly, many young people this time cannot read or write basic words (Japanese 'kanji.') And, they cannot calculate, too. These things are need in daily life, even if they don't go to college or university. Originally, Japanese student got better score in reading and calculation than any other country's student few decades ago. For, reading, writing, and calculation were very important in Japanese society. Now, however, this good value in old Japan is being reduced. This is very large problem in Japan. Secondly, there is deep gap between the level of high school education and university education. Many students who don't learn the content of high school education cannot catch up with the class in universities. Furthermore, for example, I am medical student, but I don't learn biology in high school. And there are many students like me. In addition, the care of university to us is nearly nothing. So, the level of the study in technology, medicine and so is going down. This is very large problem in Japan, too. Thirdly, as the content of school education is reduced, at the same time, the curiosity of students seems reduced. The new idea and new device are coming from the curiosity, I think. So, the reduction of it means the down of possibility that the evolutional change in various field will happen. This is very large problem in Japan. In conclusion, there are problems like these in Japan, because of the reduction of basic education. Luckily, the Japanese government is planning to change the education system. I hope this change will be going back to old Japanese school education system."
from nltk import word_tokenize
textA_w = word_tokenize(textA.lower())
textB_w = word_tokenize(textB.lower())
Set_A = set(textA_w)
Set_B = set(textB_w)
len(Set_A)/len(textA_w)
0.438871473354232
len(Set_B)/len(textB_w)
0.41494845360824745
- このように単語の種類数が増えることに関する複雑性の指標を計算することができます。
- 一方で、産出される単語の総数が増えると同じ語が繰り返し使用されるので、Type Token Ratio(ここで計算した値)の値が大きくなることはありません。
- そこで、いくつかの指標が提案されています(Vは異なり語(type)の数、Nは総語数(token))。
$$ Type Token Ratio = \frac{V}{N} $$
$$ Mean word frequency = \frac{N}{V} $$
$$ Guiraud = \frac{V}{\sqrt{N}} $$
$$ Herden's C = \frac{log V}{log N} $$
- 同様の指標で最も利用されているのがthe measure of textual lexical diversity (MTLD)です。
- 以下の手順で計算されます。
- TTRが0.72(デフォルト)になるまでテキストの最初から単語をカウントする。
- TTRが0.72を超えたら、語数を保存し、1に戻る。
- テキストの最後の単語までカウントしたら終了。
- 保存した単語数の平均値を計算する。
演習問題17¶
上の2つの文のMTLDを計算してみましょう。
9.2 文の構造の種類数が増えることに関する複雑性¶
- 単語の種類数に関する複雑性と同じ考え方で、文の構造においてもさまざまな構造が含まれている文章は複雑と言えます。
- この前提に基づいて、上の2つの文における文の構造に関する複雑性を算出してみます。
- 文章を構成するすべての文の依存木間の「距離」を計算し、その平均値を文の構造のばらつき(つまり、複雑性)とみなします。
import sys
sys.path.append("../")
from common import utils
import numpy as np
from nltk import sent_tokenize
import itertools
sents_A = sent_tokenize(textA)
sents_B = sent_tokenize(textB)
c_A = itertools.combinations(sents_A,2)
c_B = itertools.combinations(sents_B,2)
dist_A = []
dist_B = []
for i,j in c_A:
V1,E1 = utils.text_to_set_dep_un(i)
V2,E2 = utils.text_to_set_dep_un(j)
w1 = len(word_tokenize(i))
w2 = len(word_tokenize(j))
dist = utils.edit_dist(E1,E2)/max(w1,w2)
dist_A.append(dist)
for i,j in c_B:
V1,E1 = utils.text_to_set_dep_un(i)
V2,E2 = utils.text_to_set_dep_un(j)
w1 = len(word_tokenize(i))
w2 = len(word_tokenize(j))
dist = utils.edit_dist(E1,E2)/max(w1,w2)
dist_B.append(dist)
sum(dist_A)/len(dist_A)
0.8650102535687065
sum(dist_B)/len(dist_B)
0.8271951800760516
演習問題18¶
../corpus/ICNALE/内にある個々のファイルはアジアのさまざまな学生が書いた作文です。この作文の中から文の構造の種類数が増えることに関する複雑性が最も高い文章と最も低い文章を見つけて表示してみましょう。
9.3 文章におけるその他の複雑性¶
- これまで単文を対象としてきた複雑性に関する指標を文章を対象として使用する場合、文章を構成する個々の文に関して指標を算出し、その平均値を文章全体の複雑性として扱うことはある程度妥当だと言えます。
- 文章を構成する個々の文に含まれる単語の数を足し合わせて、文の数で割ることにより、その文章の構成上の複雑性に関する指標として扱うことができそうです。
- MTLDを文章における単語の分類上の複雑性に関する指標として扱うことができそうです。
- 依存木間の距離の平均値を文章における統語構造の分類上の複雑性に関する指標として扱うことができそうです。
- 依存木の木の深さ、木の平坦さの平均値を文章における階層の複雑性に関する指標として扱うことができそうです。
- 単語を対象とする場合
- 構成上の複雑性: 単語の数
- 分類上の複雑性: 単語の種類数
- 依存木を対象とする場合
- 分類上の複雑性: 構造の種類数(依存木間の距離)
- 階層の複雑性: 構造に関する特徴量(木の深さ、木の平坦さ)
演習問題19¶
身長、体重、骨格、筋肉量などを総合して「体格」という概念を表現することがあります。身長や体重は、この「体格」という概念を構成していると考えるので、この2つの測定値はある程度関係していることが前提になります(身長の測定値が大きくなると体重の測定値が大きくなる)。
この考え方を前提とすると、単語を対象とした場合の構成上の複雑性および分類上の複雑性、依存木を対象とした場合の構成上の複雑性および分類上の複雑性が「文章の複雑性」を構成していると考えることができます。
../corpus/ICNALE/内の個々の作文で上述の4つの複雑性に関する指標を算出し、散布図を描きなさい。