Readability is the ease with which a reader can understand a written text. In natural language, the readability of text depends on its content (the complexity of its vocabulary and syntax) and its presentation (such as typographic aspects that affect legibility, like font size, line height, character spacing, and line length). Researchers have used various factors to measure readability, such as:
Higher readability eases reading effort and speed for any reader, but it makes a larger difference for those who do not have high reading comprehension.
Readability exists in both natural language and programming languages though in different forms. In programming, things such as programmer comments, choice of loop structure, and choice of names can determine the ease with which humans can read computer program code.
Numeric readability metrics (also known as readability tests or readability formulas) for natural language tend to use simple measures like word length (by letter or syllable), sentence length, and sometimes some measure of word frequency. They can be built into word processors, can score documents, paragraphs, or sentences, and are a much cheaper and faster alternative to a readability survey involving human readers. They are faster to calculate than more accurate measures of syntactic and semantic complexity. In some cases they are used to estimate appropriate grade level.
Several studies in the 1940s showed that even small increases in readability greatly increases readership in large-circulation newspapers.
In 1947, Donald Murphy of Wallace's Farmer used a split-run edition to study the effects of making text easier to read. They found that reducing from a 9th to the 6th-grade reading level increased readership by 43% for an article on 'nylon'. The result was a gain of 42,000 readers in a circulation of 275,000. He also found a 60% increase in readership for an article on corn, with better responses from people under 35.
Wilber Schramm interviewed 1,050 newspaper readers. He found that an easier reading style helps to determine how much of an article is read. This was called reading persistence, depth, or perseverance. He also found that people will read less of long articles than of short ones. A story 9 paragraphs long will lose three out of 10 readers by the 5th paragraph. A shorter story will lose only two. Schramm also found that the use of subheads, bold-face paragraphs, and stars to break up a story actually lose readers.
A study in 1947 by Melvin Lostutter showed that newspapers generally were written at a level five years above the ability of average American adult readers.
The reading ease of newspaper articles was not found to have much connection with the education, experience, or personal interest of the journalists writing the stories. It instead had more to do with the convention and culture of the industry. Lostutter argued for more readability testing in newspaper writing. Improved readability must be a "conscious process somewhat independent of the education and experience of the staffs writers."
A study by Charles Swanson in 1948 showed that better readability increases the total number of paragraphs read by 93% and the number of readers reading every paragraph by 82%.
In 1948, Bernard Feld did a study of every item and ad in the Birmingham News of 20 November 1947. He divided the items into those above the 8th-grade level and those at the 8th grade or below. He chose the 8th-grade breakpoint, as that was determined to be the average reading level of adult readers. An 8th-grade text "...will reach about 50% of all American grown-ups," he wrote. Among the wire-service stories, the lower group got two-thirds more readers, and among local stories, 75% more readers. Feld also believed in drilling writers in Flesch's clear-writing principles.
Both Rudolf Flesch and Robert Gunning worked extensively with newspapers and the wire services in improving readability. Mainly through their efforts in a few years, the readability of US newspapers went from the 16th to the 11th-grade level, where it remains today.
The two publications with the largest circulations, TV Guide (13 million) and Readers Digest (12 million), are written at the 9th-grade level. The most popular novels are written at the 7th-grade level. This supports the fact that the average adult reads at the 9th-grade level. It also shows that, for recreation, people read texts that are two grades below their actual reading level.
George Klare and his colleagues looked at the effects of greater reading ease on Air Force recruits. They found that more readable texts resulted in greater and more complete learning. They also increased the amount read in a given time, and made for easier acceptance.
In the 1880s, English professor L. A. Sherman found that the English sentence was getting shorter. In Elizabethan times, the average sentence was 50 words long. In his own time, it was 23 words long.
Sherman's work established that:
Sherman wrote: "Literary English, in short, will follow the forms of standard spoken English from which it comes. No man should talk worse than he writes, no man should write better than he should talk.... The oral sentence is clearest because it is the product of millions of daily efforts to be clear and strong. It represents the work of the race for thousands of years in perfecting an effective instrument of communication."
In 1889 in Russia, the writer Nikolai A. Rubakin published a study of over 10,000 texts written by everyday people. From these texts, he took 1,500 words he thought most people understood. He found that the main blocks to comprehension are unfamiliar words and long sentences. Starting with his own journal at the age of 13, Rubakin published many articles and books on science and many subjects for the great numbers of new readers throughout Russia. In Rubakin's view, the people were not fools. They were simply poor and in need of cheap books, written at a level they could grasp.
In 1921, Harry D. Kitson published The Mind of the Buyer, one of the first books to apply psychology to marketing. Kitson's work showed that each type of reader bought and read their own type of text. On reading two newspapers and two magazines, he found that short sentence length and short word length were the best contributors to reading ease.
The earliest reading ease assessment is the subjective judgment termed text leveling. Formulas do not fully address the various content, purpose, design, visual input, and organization of a text. Text leveling is commonly used to rank the reading ease of texts in areas where reading difficulties are easy to identify, such as books for young children. At higher levels, ranking reading ease becomes more difficult, as individual difficulties become harder to identify. This has led to better ways to assess reading ease.
In the 1920s, the scientific movement in education looked for tests to measure students' achievement to aid in curriculum development. Teachers and educators had long known that, to improve reading skill, readers—especially beginning readers—need reading material that closely matches their ability. University-based psychologists did much of the early research, which was later taken up by textbook publishers.
Educational psychologist Edward Thorndike of Columbia University noted that, in Russia and Germany, teachers used word frequency counts to match books to students. Word skill was the best sign of intellectual development, and the strongest predictor of reading ease. In 1921, Thorndike published Teachers Word Book, which contained the frequencies of 10,000 words. It made it easier for teachers to choose books that matched class reading skills. It also provided a basis for future research on reading ease.
In 1923, Bertha A. Lively and Sidney L. Pressey published the first reading ease formula. They were concerned that junior high school science textbooks had so many technical words. They felt that teachers spent all class time explaining these words. They argued that their formula would help to measure and reduce the "vocabulary burden" of textbooks. Their formula used five variable inputs and six constants. For each thousand words, it counted the number of unique words, the number of words not on the Thorndike list, and the median index number of the words found on the list. Manually, it took three hours to apply the formula to a book.
After the Lively–Pressey study, people looked for formulas that were more accurate and easier to apply. By 1980, over 200 formulas were published in different languages. In 1928, Carleton Washburne and Mabel Vogel created the first modern readability formula. They validated it by using an outside criterion, and correlated .845 with test scores of students who read and liked the criterion books. It was also the first to introduce the variable of interest to the concept of readability.
In 1934, Edward Thorndike published his formula. He wrote that word skills can be increased if the teacher introduces new words and repeats them often. In 1939, W.W. Patty and W. I Painter published a formula for measuring the vocabulary burden of textbooks. This was the last of the early formulas that used the Thorndike vocabulary-frequency list.
During the recession of the 1930s, the U.S. government invested in adult education. In 1931, Douglas Waples and Ralph Tyler published What Adults Want to Read About. It was a two-year study of adult reading interests. Their book showed not only what people read but what they would like to read. They found that many readers lacked suitable reading materials: they would have liked to learn but the reading materials were too hard for them.
Lyman Bryson of Teachers College, Columbia University found that many adults had poor reading ability due to poor education. Even though colleges had long tried to teach how to write in a clear and readable style, Bryson found that it was rare. He wrote that such language is the result of a "...discipline and artistry that few people who have ideas will take the trouble to achieve... If simple language were easy, many of our problems would have been solved long ago." Bryson helped set up the Readability Laboratory at the College. Two of his students were Irving Lorge and Rudolf Flesch.
In 1934, Ralph Ojemann investigated adult reading skills, factors that most directly affect reading ease, and causes of each level of difficulty. He did not invent a formula, but a method for assessing the difficulty of materials for parent education. He was the first to assess the validity of this method by using 16 magazine passages tested on actual readers. He evaluated 14 measurable and three reported factors that affect reading ease.
Ojemann emphasized the reported features, such as whether the text was coherent or unduly abstract. He used his 16 passages to compare and judge the reading ease of other texts, a method now called scaling. He showed that even though these factors cannot be measured, they cannot be ignored.
Also in 1934, Ralph Tyler and Edgar Dale published the first adult reading ease formula based on passages on health topics from a variety of textbooks and magazines. Of 29 factors that are significant for young readers, they found ten that are significant for adults. They used three of these in their formula.
In 1935, William S. Gray of the University of Chicago and Bernice Leary of Xavier College in Chicago published What Makes a Book Readable, one of the most important books in readability research. Like Dale and Tyler, they focused on what makes books readable for adults of limited reading ability. Their book included the first scientific study of the reading skills of American adults. The sample included 1,690 adults from a variety of settings and regions. The test used a number of passages from newspapers, magazines, and books—as well as a standard reading test. They found a mean grade score of 7.81 (eighth month of the seventh grade). About one-third read at the 2nd to 6th-grade level, one-third at the 7th to 12th-grade level, and one-third at the 13th–17th grade level.
The authors emphasized that one-half of the adult population at that time lacked suitable reading materials. They wrote, "For them, the enriching values of reading are denied unless materials reflecting adult interests are adapted to their needs." The poorest readers, one-sixth of the adult population, need "simpler materials for use in promoting functioning literacy and in establishing fundamental reading habits."
Gray and Leary then analyzed 228 variables that affect reading ease and divided them into four types:
They found that content was most important, followed closely by style. Third was format, followed closely by organization. They found no way to measure content, format, or organization—but they could measure variables of style. Among the 17 significant measurable style variables, they selected five to create a formula:
In 1939, Irving Lorge published an article that reported other combinations of variables that indicate difficulty more accurately than the ones Gray and Leary used. His research also showed that, "The vocabulary load is the most important concomitant of difficulty." In 1944, Lorge published his Lorge Index, a readability formula that used three variables and set the stage for simpler and more reliable formulas that followed.
By 1940, investigators had:
In 1943, Rudolf Flesch published his PhD dissertation, Marks of a Readable Style, which included a readability formula to predict the difficulty of adult reading material. Investigators in many fields began using it to improve communications. One of the variables it used was personal references, such as names and personal pronouns. Another variable was affixes.
In 1948, Flesch published his Reading Ease formula in two parts. Rather than using grade levels, it used a scale from 0 to 100, with 0 equivalent to the 12th grade and 100 equivalent to the 4th grade. It dropped the use of affixes. The second part of the formula predicts human interest by using personal references and the number of personal sentences. The new formula correlated 0.70 with the McCall-Crabbs reading tests. The original formula is:
Publishers discovered that the Flesch formulas could increase readership up to 60%. Flesch's work also made an enormous impact on journalism. The Flesch Reading Ease formula became one of the most widely-used, tested, and reliable readability metrics. In 1951, Farr, Jenkins, and Patterson simplified the formula further by changing the syllable count. The modified formula is:
In 1975, in a project sponsored by the U.S. Navy, the Reading Ease formula was recalculated to give a grade-level score. The new formula is now called the Flesch–Kincaid grade-level formula. The Flesch–Kincaid formula is one of the most popular and heavily tested formulas. It correlates 0.91 with comprehension as measured by reading tests.
Edgar Dale, a professor of education at Ohio State University, was one of the first critics of Thorndike's vocabulary-frequency lists. He claimed that they did not distinguish between the different meanings that many words have. He created two new lists of his own. One, his "short list" of 769 easy words, was used by Irving Lorge in his formula. The other was his "long list" of 3,000 easy words, which were understood by 80% of fourth-grade students. However, one has to extend the word lists by regular plurals of nouns, regular forms of the past tense of verbs, progressive forms of verbs etc. In 1948, he incorporated this list into a formula he developed with Jeanne S. Chall, who later founded the Harvard Reading Laboratory.
To apply the formula:
Finally, to compensate for the "grade-equivalent curve," apply the following chart for the Final Score:
|Raw score||Final score|
|4.9 and below||Grade 4 and below|
|9.0–9.9||Grades 13–15 (college)|
|10 and above||Grades 16 and above.|
Correlating 0.93 with comprehension as measured by reading tests, the Dale–Chall formula is the most reliable formula and is widely used in scientific research.
In 1995, Dale and Chall published a new version of their formula with an upgraded word list, the New Dale–Chall readability formula. Its formula is:
Raw score = 64 - 0.95 *(PDW) - 0.69 *(ASL)
In the 1940s, Robert Gunning helped bring readability research into the workplace. In 1944, he founded the first readability consulting firm dedicated to reducing the "fog" in newspapers and business writing. In 1952, he published The Technique of Clear Writing with his own Fog Index, a formula that correlates 0.91 with comprehension as measured by reading tests. The formula is one of the most reliable and simplest to apply:
In 1963, while teaching English teachers in Uganda, Edward Fry developed his Readability Graph. It became one of the most popular formulas and easiest to apply. The Fry Graph correlates 0.86 with comprehension as measured by reading tests.
Harry McLaughlin determined that word length and sentence length should be multiplied rather than added as in other formulas. In 1969, he published his SMOG (Simple Measure of Gobbledygook) formula:
In 1973, a study commissioned by the US military of the reading skills required for different military jobs produced the FORCAST formula. Unlike most other formulas, it uses only a vocabulary element, making it useful for texts without complete sentences. The formula satisfied requirements that it would be:
The formula is:
The FORCAST formula correlates 0.66 with comprehension as measured by reading tests.
The Golub Syntactic Density Score was developed by Lester Golub in 1974. It is among a smaller subset of readability formulas that concentrate on the syntactic features of a text. To calculate the reading level of a text, a sample of several hundred words is taken from the text. The number of words in the sample is counted, as are the number of T-units. A T-unit is defined as an independent clause and any dependent clauses attached to it. Other syntactical units are then counted and entered into the following table:
1. Words/T-unit .95 X _________ ___ 2. Subordinate clauses/T-unit .90 X _________ ___ 3. Main clause word length (mean) .20 X _________ ___ 4. Subordinate clause length (mean) .50 X _________ ___ 5. Number of Modals (will, shall, can, may, must, would...) .65 X _________ ___ 6. Number of Be and Have forms in the auxiliary .40 X _________ ___ 7. Number of Prepositional Phrases .75 X _________ ___ 8. Number of Possessive nouns and pronouns .70 X _________ ___ 9. Number of Adverbs of Time (when, then, once, while...) .60 X _________ ___ 10. Number of gerunds, participles, and absolutes Phrases .85 X _________ ___
Users add the numbers in the right hand column and divide the total by the number of T-units. Finally, the quotient is entered into the following table to arrive at a final readability score.
For centuries, teachers and educators have seen the importance of organization, coherence, and emphasis in good writing. Beginning in the 1970s, cognitive theorists began teaching that reading is really an act of thinking and organization. The reader constructs meaning by mixing new knowledge into existing knowledge. Because of the limits of the reading ease formulas, some research looked at ways to measure the content, organization, and coherence of text. Although this did not improve the reliability of the formulas, their efforts showed the importance of these variables in reading ease.
Studies by Walter Kintch and others showed the central role of coherence in reading ease, mainly for people learning to read. In 1983, Susan Kemper devised a formula based on physical states and mental states. However, she found this was no better than word familiarity and sentence length in showing reading ease.
Bonnie Meyer and others tried to use organization as a measure of reading ease. While this did not result in a formula, they showed that people read faster and retain more when the text is organized in topics. She found that a visible plan for presenting content greatly helps readers to assess a text. A hierarchical plan shows how the parts of the text are related. It also aids the reader in blending new information into existing knowledge structures.
Bonnie Armbruster found that the most important feature for learning and comprehension is textual coherence, which comes in two types:
Armbruster confirmed Kintsch's finding that coherence and structure are more help for younger readers. R. C. Calfee and R. Curley built on Bonnie Meyer's work and found that an unfamiliar underlying structure can make even simple text hard to read. They brought in a graded system to help students progress from simpler story lines to more advanced and abstract ones.
Many other studies looked at the effects on reading ease of other text variables, including:
John Bormuth of the University of Chicago looked at reading ease using the new Cloze deletion test developed by Wilson Taylor. His work supported earlier research including the degree of reading ease for each kind of reading. The best level for classroom "assisted reading" is a slightly difficult text that causes a "set to learn," and for which readers can correctly answer 50% of the questions of a multiple-choice test. The best level for unassisted reading is one for which readers can correctly answer 80% of the questions. These cutoff scores were later confirmed by Vygotsky and Chall and Conard. Among other things, Bormuth confirmed that vocabulary and sentence length are the best indicators of reading ease. He showed that the measures of reading ease worked as well for adults as for children. The same things that children find hard are the same for adults of the same reading levels. He also developed several new measures of cutoff scores. One of the most well known was the Mean Cloze Formula, which was used in 1981 to produce the Degree of Reading Power system used by the College Entrance Examination Board.
In 1988, Jack Stenner and his associates at MetaMetrics, Inc. published a new system, the Lexile Framework, for assessing readability and matching students with appropriate texts.
The Lexile framework uses average sentence length, and average word frequency in the American Heritage Intermediate Corpus to predict a score on a 0–2000 scale. The AHI Corpus includes five million words from 1,045 published works often read by students in grades three to nine.
The Lexile Book Database has more than 100,000 titles from more than 450 publishers. By knowing a student's Lexile score, a teacher can find books that match his or her reading level.
In 2000, researchers of the School Renaissance Institute and Touchstone Applied Science Associates published their Advantage-TASA Open Standard (ATOS) Reading ease Formula for Books. They worked on a formula that was easy to use and that could be used with any texts.
The project was one of the widest reading ease projects ever. The developers of the formula used 650 normed reading texts, 474 million words from all the text in 28,000 books read by students. The project also used the reading records of more than 30,000 who read and were tested on 950,000 books.
They found that three variables give the most reliable measure of text reading ease:
They also found that:
Coh-Metrix can be used in many different ways to investigate the cohesion of the explicit text and the coherence of the mental representation of the text. "Our definition of cohesion consists of characteristics of the explicit text that play some role in helping the reader mentally connect ideas in the text." The definition of coherence is the subject of much debate. Theoretically, the coherence of a text is defined by the interaction between linguistic representations and knowledge representations. While coherence can be defined as characteristics of the text (i.e., aspects of cohesion) that are likely to contribute to the coherence of the mental representation, Coh-Metrix measurements provide indices of these cohesion characteristics.
Unlike the traditional readability formulas, Artificial intelligence approaches to readability assessment (also known as Automatic Readability Assessment) incorporate myriad linguistic features and construct statistical prediction models to predict text readability. These approaches typically consist of three steps: 1. a training corpus of individual texts, 2. a set of linguistic features to be computed from each text, and 3. a machine learning model to predict the readability, using the computed linguistic feature values.
In 2012, Sowmya Vajjala at the University of Tübingen created the WeeBit corpus by combining educational articles from the Weekly Reader website and BBC-Bitesize website, which provide texts for different age groups. In total, there are 3125 articles that are divided into 5 readability levels (from age 7 to 16). Weebit corpus has been used in several AI-based readability assessment research.
Wei Xu (University of Pennsylvania), Chris Callison-Burch (University of Pennsylvania), and Courtney Napoles (Johns Hopkins University) introduced the Newsela corpus to the academic field in 2015. The corpus is a collection of thousands of news articles professionally leveled to different reading complexities by professional editors at Newsela. The corpus was originally introduced for text simplification research, but was also used for text readability assessment.
The type-token ratio is one of the features that are often used to captures the lexical richness, which is a measure of vocabulary range and diversity. To measure the lexical difficulty of a word, the relative frequency of the word in a representative corpus like the Corpus of Contemporary American English (COCA) is often used. Below includes some examples for lexico-semantic features in readability assessment.
In addition, Lijun Feng pioneered the cognitively-motivated features (mostly lexical) in 2009. This was during her doctorate study at the City University of New York (CUNY). The cognitively-motivated features were originally designed for adults with intellectual disability, but was proved to improve readability assessment accuracy in general. Cognitively-motivated features, in combination with a logistic regression model, can correct the average error of Flesch–Kincaid grade-level by more than 70%. The newly discovered features by Feng include:
Syntactic complexity is correlated with longer processing times in text comprehension. It is common to use a rich set of these syntactic features to predict the readability of a text. The more advanced variants of syntactic readability features are frequently computed from parse tree. Emily Pitler (University of Pennsylvania) and Ani Nenkova (University of Pennsylvania) are considered pioneers in evaluating the parse-tree syntactic features and making it widely used in readability assessment. Some examples include:
The accuracy of readability formulas increases when finding the average readability of a large number of works. The tests generate a score based on characteristics such as statistical average word length (which is used as an unreliable proxy for semantic difficulty; sometimes word frequency is taken into account) and sentence length (as an unreliable proxy for syntactic complexity) of the work.
Most experts agree that simple readability formulas like Flesch–Kincaid grade-level can be highly misleading. Even though the traditional features like the average sentence length have high correlation with reading difficulty, the measure of readability is much more complex. The Artificial Intelligence (AI), data-driven approach (see above) was studied to tackle this shortcoming.
Writing experts have warned that an attempt to simplify the text only by changing the length of the words and sentences may result in text that is more difficult to read. All the variables are tightly related. If one is changed, the others must also be adjusted, including approach, voice, person, tone, typography, design, and organization.
Writing for a class of readers other than one's own is very difficult. It takes training, method, and practice. Among those who are good at this are writers of novels and children's books. The writing experts all advise that, besides using a formula, observe all the norms of good writing, which are essential for writing readable texts. Writers should study the texts used by their audience and their reading habits. This means that for a 5th-grade audience, the writer should study and learn good quality 5th-grade materials.
|Wikiversity has learning resources about Wikiversity:Readability|
|Look up readability in Wiktionary, the free dictionary.|