Text and Sentiment Analysis Tutorial

Created by Steve Hoover, Modified on Fri, Apr 19 at 1:10 PM by Steve Hoover

Overview

Text and Sentiment Analysis refers to a body of analytic techniques to evaluate a corpus of text to develop insights about a product, company, service, etc.  The corpus could come from a wide range of sources such as user comments from review sites, tweets, or from text available at web sites, 10-K reports, and the like. When humans interpret text, we use our understanding to interpret the contents of the text, and the emotional intent behind various words to infer whether a sentence, document, or corpus has a positive or negative tone, and perhaps whether words and sentences in the corpus express various emotional states (e.g., joy, anger, surprise).  

In text analysis, we attempt to understand the contents and meaning of the text by exploring issues such as, what are the underlying themes or topics within a text corpus? What does the text tell us about the author(s)? What topics are trending? What keywords (a contiguous set of words) best characterize the contents of the corpus?  These questions are typically easy for humans to answer by reading the text.  But most of us can potentially process perhaps 50 pages of text per hour, whereas top text analysis systems can analyze 200,000 pages per hour or more.  We can use text analysis to create a summary of the contents of the text corpus, extract “important” words and phrases from the corpus and identify related words and concepts.  Important outputs of text analysis are word clouds summarizing the frequency of occurrence of various words (or word combinations, such as bigrams or trigrams), and the co-occurrence patterns of the words which help us understand the overall contents and meaning of the corpus. Other types of analyses include document similarity, namely, the extent to which one or more documents within a corpus are similar to other documents. 

In sentiment analyses we attempt to summarize the sentiments and emotions embedded in the text corpus, which are subjective aspects of textual content, and not just their objective contents -- we attempt to detect, extract, and assess value judgments, subjective opinions, and the emotional contents in text data.  Data useful for text and sentiment analysis are widely available today and they can provide companies valuable insights for making business decisions.  The typical sources of data used for text and sentiment analysis in marketing come from sources such as Facebook and Instagram postings, Tweets, reviews from sites such as Yelp and TripAdvisor, as well as from product/service specific reviews collected by the supplier of the product or service. 

In sum, the summaries generated from text and sentiment analyses provide at least a high-level overview of the meaning and emotions embedded within a text corpus.

Getting Started

You can use your own data or use a template preformatted by the Enginius software. Because the analytical models underlying Sentiment Analysis require a specific data format, users with their own data should review the appropriate preformatted template to become familiar with the data structure. The next section explains how to create an easy-to-use template to enter your own data.

Creating a Template

From the Enginius Dashboard, click the Templates dropdown and select Sentiment Analysis to open the dialog box to create a template.

The following dialog box will appear:

Select the options desired for the sentiment analysis and click “Run” to generate the template for entering your data.

Note, the options for each of the data sources are different and users are encouraged to go through the template generation process to fully understand the model data requirements.

Data source

  • Sentiment Data:

This option requires a data block containing the text to be analyzed. The format of the required data is shown below, where the first column contains a unique id for each row, Verbatim data is the text to be analyzed for each respondent. The optional Date and Rating fields specify the date of the corresponding text and a rating (from a scale of 1-to-n) where the respondent has supplied a rating to accompany their text.

 

  • Twitter Data:

When analyzing Twitter data, no data template is needed, and you will receive an error message if you try to generate a template.  To run Sentiment analysis with Twitter data, simply select the Sentiment Analysis icon to open the model and select the Twitter handle or content you would like to analyze (see Run Analysis section in tutorial for more details).  

  • Web pages:

The template generation dialog box allows you to specify the number of websites to be analyzed and whether to include custom stop words.

 

After clicking “Run”, the following template will be generated:

Entering Your Data

*If using Twitter data as your data source, you may skip directly to the Run Analysis section of the tutorial as there is not additional data that needs to be entered.

For Sentiment or Web pages data, it is recommended to use the Template feature or at least review the template format to ensure that your data is the correct format. You may enter your data in Enginius in one of three ways:

  • Enter your data directly into the Enginius online template.
  • Copy and paste your data from Excel (or other data source) to Enginius.
  • Download the Enginius template to Excel (using the Save function in Enginius), fill out the data in Excel (make sure to adhere to the template format), and then upload the Excel file back to Enginius (using the Load function in Enginius).

 

 

For the remainder of this tutorial, we will use the “OfficeStar: Sentiment Analysis” data set that loads when you open the Enginius Sentiment tutorial.

 

Run Analysis

When you click on the Run Sentiment Analysis button (or Sentiment Analysis icon), the analysis setup window will display. The set-up window will vary depending on your Data source.

Graphical user interface, text, application, email

Description automatically generated

Data source

You need to first select the appropriate Data source for the data you are analyzing. Once selected, the remainder of the setup window will adjust to obtain the required data appropriate for analyzing that type of data.

Sentiment data

When choosing Sentiment data as your data source, the Sentiment data analysis has the Data options as shown above. You will need to select the Enginius data block that contains the Sentiment data and then identify the Verbatim (text to be analyzed), Date (if included), and Rating (if included) columns in that data block.

Twitter data

When the Twitter data option is selected, the data options are either a Twitter handle (e.g. @wsj) or Twitter keyword(s) (e.g. stock market). The software retrieves up to 1,000 corresponding tweets directly from Twitter over the past week to analyze. Note: only one twitter handle may be searched at a time.

 

Web pages

When choosing Web pages as your data source, the Sentiment data analysis has the data options as shown below. You will need to select the Enginius data block that contains the list of Web pages that you want to analyze.

Stop words

Typically, text contains many words, called “stop words” (e.g., “and,” “the,” “of,” etc.), that do not perform a significant lexical function. It is best to remove these words to facilitate interpretation of the text. Enginius offers the option of removing a long list of stop words (851), a short list (174), or retaining all the words for analysis (see Appendix B for a list of stop words included in the Enginius lists).  

Enginius also offers the option of including additional custom stop words that can facilitate interpreting documents in a specific domain. You can incorporate these custom words by generating a custom stop word list for a specific application. Often, an initial word cloud generated from the data could provide hints about possible custom stop words, e.g., product names such as “Pepsi” or popular words that do not add any meaning in a given context (e.g., app, game, iPhone). The custom stop words must be in the following format of column header, index column, and stop word column:

Table

Description automatically generated

Note that custom stop works 

 

 

Be aware that removing stop words can affect the interpretation of your text significantly. For example, the Short and Long List of stop words included with Enginius includes words such as “didn’t” and “not” which can radically change the meaning of a phrase:

 

Before stop words

After stop words

The product is really very good (Positive)

product really good (Positive)

The product seems to be good (Positive)

product seems good (Positive)

I didn’t like the product (Negative)

like product (Positive)

The product is not good (Negative)

product good (Positive)

 

It is recommended to test your analysis both with and without stop words to see how your analysis performs.

 

Advanced options

As shown in the dialog box below, Enginius also provides advanced analysis options which you can access by clicking on the “Advanced” checkbox in the lower left corner. Checking “Advanced” will result in two additional options: “Word co-occurrence analysis and RAKE” and “Topic Model”.  Instead of single word analysis as in a word cloud, we can analyze relationships between words.  Specifically, we can explore co-occurrence of pairs of words within individual documents or within the entire corpus.  The word co-occurrence analysis is then supplemented with RAKE (Rapid Automatic Keyword Extraction) that allows for exploration of the occurrence of more than two contiguous words, which allows extraction of important longer sequences of words that help to characterize the contents of the documents.  Further details about these methods are provided in the Appendix.

After selecting all the options, click the Run button found at the bottom of the Sentiment analysis setup window to begin the analysis. By default, the report will output as a web page.

 

Reminder: Clicking the globe icon beside the “Run” option will allow you to choose a different output format for the report.

 

 

You will see a pop-up indicating the progress of the sentiment analysis. Your report will output in the format chosen (Microsoft, PDF, or Zip format may automatically download to your hard drive).

 

Sentiment analysis is a time-consuming process and may take several minutes for a report to be generated.

 

 

Interpreting the Results

The report generated by sentiment analysis contains several sections, depending on the options chosen.  The results described below were generated with these model settings:

 

Word cloud

The first section presents the word cloud developed from the data.  The word cloud is a graphical representation of the words found in a corpus, where the size and location of a word in the cloud diagram indicates the prominence of the word in the text.   A word cloud is not the output of a statistical model per se.  Rather it is an analysis of the input text for frequency of occurrence of various words after preprocessing the text by using such techniques as removing stop words (typically common words such as the, and, how, but, etc. which are unlikely to add meaning to a sentence), and   stemming to reduce the number of words in a document by getting to the stem or root (e.g., buyer, buyers, buying, etc. could be put into a stem word “buy.”).

Enginius outputs two versions of word clouds, the first based on the stems (roots) of words and the second based on the un-stemmed words in the text, as in the example below:

Sentiment analysis

If the input text corpus contains information about the date of a specific review/document and an associated rating, then the Enginius output includes additional summaries as shown below (for data set consisting of user reviews and ratings of products posted on various dates):

 

Valence analysis

The next section shows the overall valence of the text corpus, a popular application of sentiment analysis.  Here, we develop an overall valence or polarity of a text corpus.  Valence is the technical term for the “subjective inclination” of a document, measured along a Positive/Neutral/Negative continuum.  Several dictionaries or lexicons are publicly available (e.g., AFINN, NRC, Bing) which contain the valence weights associated with common words and phrases.  This type of analysis can be applied to determine the polarity of a sentence, document, or the whole corpus.  The Enginius output is for the entire corpus.  (If you need the polarity for a subset of the corpus, you can re-run the analysis by including only the selected text). 

Two additional outputs are useful in analyzing the results, a Valence histogram and a Valence pie chart:


Also included in the sentiment output is the valence evolution, which shows the evolution over time if Date was included in the analysis:

 

A valence word cloud is also produced, as shown below.

Emotion analysis

The next section of the sentiment analysis report depicts the emotions expressed within the text corpus. 

Similar to earlier sections of the report, a histogram and pie chart of Emotion are also produced.

Using emotions associated with various words as determined from publicly available emotion lexicons (for common emotions such as happy, excited, discouraged, disappointed), we can generate a word cloud of emotions associated with a corpus (shown below).

Results from Advanced analysis options

Word co-occurrences between adjacent words in a corpus

The first output in this section is a word co-occurrence graph that summarizes the relative frequencies of pairs of words that are adjacent to each other when we consider the entire corpus as a single document.  For every word, adjacent words provide a context.  The relative frequencies of pairs of words are compiled into a “co-occurrence matrix” and the important elements from this matrix are displayed as a graph.  We only include for display words that are likely to most useful for interpretation, namely, nouns, verbs, or adjectives.  In the output here, we see that “good selection” “great selection” “ink cartridge,” “helpful staff” etc.  go together often relative to other word pairs in the corpus.  The strength of a connection is highlighted by the depth of the color of the line connecting two words.  

Word co-occurrences between adjacent words in a corpus

The next graph shows word co-occurrence when we consider word co-occurrences within sentences, i.e., the pair of words can occur anywhere within a sentence in a document, i.e., the context is broader than the adjacent word restriction for co-occurrence in the graph we considered above (see Appendix A for additional details). Again, we display only nouns, verbs, and adjectives.  Although there are substantial similarities between the two graphs for this data set, the analysis at the sentence level could alter the strengths of the connections between words.  

RAKE Analysis (Rapid Automatic Keyword Extraction)

The next set of results summarizes output from RAKE (Rapid Automatic Keyword Extraction) (see Appendix A for additional details). The small data set used in this tutorial is not sufficient to generate an interesting output.  We basically see the same results that we get from the sentence-level co-occurrence graph, showing that two bigrams (“good selection” and “great selection”) are the most important keywords in this text corpus.

Chart, bar chart

Description automatically generated

With larger text corpora, we will get more informative RAKE analysis outputs, as in the example below, which shows keywords which are informative.  Note: the example below was not generated from our OfficeStar data but is a representative output from RAKE analysis.

Chart

Description automatically generated

A related piece of information that Enginius provides is distribution of parts of speech tags in the corpus.  Here nouns, verbs, and adjectives (which are the only words shown in the co-occurrence graphs above) make up about 60% of all the words in this corpus.

Distribution of Universal Parts of Speech Tags

Topic Model

The final set of results from the advanced analysis is the topic model, which refers to a class of machine learning methods for analyzing and classifying text into broad thematic categories to help us understand the underlying structure of the contents of a text corpus (see Appendix A for additional details).  Topic modeling is a data summarization tool for text, like what factor analysis is for summarizing numeric data.  Each topic represents a theme (a grouping of words), but the analysis does not provide you a label for each topic.  It is up to you to interpret the theme captured by each topic by examining the most important words in a topic.  Although the example data here is too small to develop a good topic model that is interpretable, we can nevertheless see some patterns.  For example, the first topic is likely about the characteristics of a retailer – carries computers, is in town, can walk to shop, good price, etc.   It also contains some discrepant themes, such as “rude” and “slow.”  In Enginius, you can pick up to 15 topics, and for each topic, it reports the top 15 words.  Although each topic is likely to contain a different set of words, there may be some words that may appear in more than one topic, indicating their importance in several topics.  Note also that the model is specified on the stemmed text, which adds to the challenges of interpreting the topics.  

The output also includes two metrics to help you assess the adequacy of the topic model: Mean topic coherence and Mean topic exclusivity.   Good models will have a mean topic coherence close to zero and mean topic exclusivity greater than 1.  The model here has a high exclusivity score, suggesting the topics refer to different themes.  There is no clear benchmark to evaluate the computed value of mean topic coherence.  Roughly speaking, with 15 words in a topic (which is what is built into Enginius), a mean topic coherence score that is smaller than -240 (i.e., a larger negative number) suggests that, on average, any pair of words in a topic is likely to occur together in less than 10% of the documents.  A score that is smaller than -315 suggests that, on average, any pair of words in a topic is likely to occur together in less than 5% of the documents and the score is -480 for 1%.  On this basis, the topics seem to have an adequate level of coherence.  We could consider increasing or decreasing the number of topics to check if that increases mean topic coherence.

 

Mean topic coherence = -149.367

Coherence is a metric based on the co-occurrence of the top words (say top 15) within a topic in each document of the corpus. For each pair of words in a topic, we compute the log of the probability that a document containing the higher ranked word also contains at least one instance of the lower-ranked word. The overall coherence value for a topic is the sum of scores for each pair of words. A number close to 0 (the highest possible value of this metric) indicates high coherence. We report the average coherence value across the topics. 

Mean topic exclusivity = 13.549

Exclusivity, measured using a metric called FREX, captures the extent to which the top words in a topic are exclusive to that topic (i.e., are not as likely to occur in the other topics). The exclusivity score for each top word in a topic is the harmonic mean of two equally-weighted components: (1) the rate at which the word occurs within a topic relative to its rate of occurrence in the other topics, and (2) the frequency (number of times) of a word’s occurrence within a topic relative to its frequency in the other topics. The exclusivity score for a topic is the average of the exclusivity scores of the words in a topic. The computed score is a positive number (> 0), with values substantially greater than 1 indicating a topic’s high exclusivity. We report the average exclusivity value across the topics. 

 

 

Appendix A

Co-occurrence graphs

In this analysis, we look at two words that co-occur together in the same document (e.g., a single tweet). We specify a sliding window of a given size to indicate how far apart the two words can be with respect to each other – are they just one-word apart (i.e., next to each other), within two-words apart, within three-words apart, etc.  This separation context can be specified using a skip-gram value, which for example, is equal to 1 if we restrict the two words to occur next to each other (can occur before or after a focal word).  In Enginius, we do two types of analyses, the first with a default value of 1 for the skip-gram, and the second with a “variable skip-gram” that allows co-occurrence anywhere within a sentence.  In both cases, the program creates a symmetric co-occurrence matrix specifying how many times in a corpus each pair of words occur together within the specified context (skip-gram).  This matrix is processed further to assess the probabilities of co-occurrence of each pair. The words are then plotted on a co-occurrence graph (the words that have higher probabilities of being connected are highlighted graphically with thicker shading).  As a default, we plot up to 100 top words on the graph.

Parts-of-speech tagging

POS tagging assigns a part-of-speech to every word in the text considering the context in which the word occurs. The tags include nouns (people, places, or things, including abstractions such as health or beauty), Verbs which refer to actions and processes, adjectives that specify properties of nouns, and adverbs that modify a verb, an adjective, or another adverb.  There are, of course, some ambiguities – for example words like park or train could be a verb or a noun, but many of these ambiguities can be resolved by the context in which the word occurs.  POS tagging of English words is pretty accurate, with most programs able to get over 97% accuracy.   

Across a range of English texts, the percentage of nouns varies from 17% to 25%, verbs from 14 to 17%, prepositions from 9 to 14%, pronouns from 5 to 11%, adverbs 6 to 8% and adjectives 3 to 6% (see, for example, http://infomotions.com/blog/2011/02/forays-into-parts-of-speech/).  Lincoln’s Gettysburg address had 18.8% verbs, 15.1% nouns, 10.7% pronouns, 11.4% adverbs, etc. Lincoln’s speech was a bit more “action-oriented” than the average text corpus.

RAKE (Rapid Automatic Keyword Extraction)

RAKE is based on the observation that keywords (a set of contiguous words) rarely contain punctuation or stop words.   Thus, a candidate keyword is any set of contiguous words (i.e., 2-gram or bigram for two contiguous words, 3-gram or trigram for three contiguous words, and more generally n-gram) that doesn’t contain phrase delimiters or stop words, i.e., candidate keywords are content-bearing words.  Example bigrams are “customer service,” “merry christmas,” and “thank you.”  Example trigrams are “new york times,” “miles per hour,” and “my credit card.”

In this sense, RAKE looks for specific sets of words within each document that characterize (i.e., gives meaning) to a document, and thereby provides a more nuanced interpretation of the key contents of a set of documents than is provided by word co-occurrence graphs that only consider pairs of words.   The input parameters for RAKE are a list of stop words, a set of phrase delimiters (e.g., a comma or period signifying the beginning or end of a phrase), a set of word delimiters, and the n-gram level for extraction.  

 

Once all possible candidate n-grams in a set of documents are identified, RAKE selects the most relevant ones that characterize the entire corpus.  To do this, it first computes a RAKE score for each word, which is equal to its , where degree is the number of times that word co-occurs with another word in another keyword, and frequency is the total number of times that the word occurs overall in the corpus.  This ratio favors words that predominantly occur in longer keywords, whereas degree of a word favors those words that occur often and in longer candidate keywords, and frequency favors words that occur frequently regardless of the number of words with which they co-occur.  The RAKE score for each keyword is the sum of the RAKE scores of all the words in that keyword.  Because of this summation, RAKE scores typically tend to be higher for longer sequences of words. All candidate keywords are rank-ordered from the highest to lowest on their RAKE scores.  In Enginius, we set the minimum frequency for words to be included in the rank-ordered list to be at 0.01% of the total word count, and we list up to 20 keywords that contain at most four contiguous words each (4-gram words). 

For more information about RAKE, please see Rose, Stuart, Dave Engle, Nick Cramer and Wendy Cowley (2010), “Automatic keyword extraction from individual documents,” in Text Mining: Applications and Theoryedited by Michael W. Berry and Jacob Kogan, John Wiley & Sons. Ltd. 

Topic Model

Topic modeling is based on an underlying probability model of a stylized data generating process that helped create the latent (hidden) topics in a set of documents.  The stylized data generating process is as follows: (1) For each document, the author selects topics from an unknown (Dirichlet) probability distribution of topics, where the number of topics is fixed beforehand.  (2) Likewise, the author selects each word for a given topic using an unknown probability (Dirichlet) distribution over a fixed number of words.  A Bayesian model estimates the parameters of the unknown probability distributions such that the estimated probability distributions recover as closely as possible the known set of words in each document. 

Here is an example of the stylized process of how two documents focused on two topics, sports and fashion, are crafted by their writers.  Let’s say the first document is mostly about sports but with a few components related to fashion, and let’s say the second document is mostly about fashion but with some discussion of sports.  For both documents, the writers choose a combination of words that characterize these two topics, namely, when some words appear together, they convey something about sports, and other words when they appear together convey something about fashion.  When there are many documents in a corpus, each document will contain a set of topics characterized by various word combinations.  Even though each document may contain only a few topics, a text corpus may contain dozens of topics.  Topic modeling is designed to leverage this stylized notion of how documents are generated, to identify a few topics and the relatively unique combination of words that pertain to each topic.  

To compute mean topic coherence, Enginius uses the following metric called Intrinsic UMass measure based on the empirical conditional log-probability, , where i is a higher-ranked word in a topic than word j.  The empirical score for a pair of words in the ordered list of words in the topic is then equal to:

is a counting function to count the number of documents in which a word occurs.  The numerator is a count of the number of documents that contain both words, and the denominator is the count of the number of times the higher-ranked word occurs.  The extra 1, a smoothing factor, is to allow for situations where .  The topic coherence score is then specified by summing up scores for all pairs of words in the ordered list of the top 15 words in a topic.  With 15 words in a topic, and with assuming 10% average overlap of the words across documents, the benchmark value for topic coherence is -241.8, and with 5% average overlap, the benchmark value is -314.6.   A computed value of topic coherence that is higher than these values (i.e., closer to 0), could be considered to have good coherence. 

To compute mean topic exclusivity, we use the FREX (Frequency, Exclusivity) measure proposed by Bischof and Airoldi (2012).  This metric captures the extent to which the top words in a topic are exclusive to that topic (i.e., are not as likely to occur in the other topics). The exclusivity score for each top word in a topic is the harmonic mean of two components for which we assign equal weights as default: (1) the rate at which the word occurs within a topic relative to its rate of occurrence in the other topics, and (2) the frequency (number of times) of a word’s occurrence within a topic relative to its frequency in the other topics. The exclusivity score for a topic is the average of the exclusivity scores of the words in a topic.  By default, Enginius considers the top 15 words in a topic. The computed score is a positive number (> 0), with values substantially greater than 1 indicating a topic’s high exclusivity. We report the average exclusivity value across the topics.

Bischof, Jonathan M. and Edoardo M. Airoldi (2012), “Summarizing topical content with word frequency and exclusivity.” In International Conference on Machine Learning, volume 29, Edinburgh, Scotland, UK.

R – Packages and routines used in Enginius

  • tm – text manipulation
  • stringr – remove graphical characters
  • SnowballC – stemming of words
  • Wordcloud – Plot word cloud
  • RCurl – html download 
  • XML – html text extraction
  • scales – pie chart modifications 
  • syuzhet – emotion selection
  • curl, httr, ROath, httpuv, rtweet  – tweet extraction
  • topicmodels, textmineR, text2vec – LDA model (topic model)
  • textclean, spacy – (removing emogis and replacing with text)
  • udpipe – POS, Co-occurrence, and RAKE analysis

 

Appendix B

Stop Words – Short List

a

about

above

after

again

against

all

am

an

and

any

are

aren't

as

at

be

because

been

before

being

below

between

both

but

by

can't

cannot

could

couldn't

did

didn't

do

does

doesn't

doing

don't

down

during

each

few

for

from

further

had

hadn't

has

hasn't

have

haven't

having

he

he'd

he'll

he's

her

here

here's

hers

herself

him

himself

his

how

how's

i

i'd

i'll

i'm

i've

if

in

into

is

isn't

it

it's

its

itself

let's

me

more

most

mustn't

my

myself

no

nor

not

of

off

on

once

only

or

other

ought

our

ours

ourselves

out

over

own

same

shan't

she

she'd

she'll

she's

should

shouldn't

so

some

such

than

that

that's

the

their

theirs

them

themselves

then

there

there's

these

they

they'd

they'll

they're

they've

this

those

through

to

too

under

until

up

very

was

wasn't

we

we'd

we'll

we're

we've

were

weren't

what

what's

when

when's

where

where's

which

while

who

who's

whom

why

why's

with

won't

would

wouldn't

you

you'd

you'll

you're

you've

your

yours

yourself

yourselves

 

Stop Words – Long List

able

about

above

abroad

according

accordingly

across

actually

adj

after

afterwards

again

against

ago

ahead

ain't

all

allow

allows

almost

alone

along

alongside

already

also

although

always

am

amid

amidst

among

amongst

an

and

another

any

anybody

anyhow

anyone

anything

anyway

anyways

anywhere

apart

appear

appreciate

appropriate

are

aren't

around

as

a's

aside

ask

asking

associated

at

available

away

awfully

back

backward

backwards

be

became

because

become

becomes

becoming

been

before

beforehand

begin

behind

being

believe

below

beside

besides

best

better

between

beyond

both

brief

but

by

came

can

cannot

cant

can't

caption

cause

causes

certain

certainly

changes

clearly

c'mon

co

co.

com

come

comes

concerning

consequently

consider

considering

contain

containing

contains

corresponding

could

couldn't

course

c's

currently

dare

daren't

definitely

described

despite

did

didn't

different

directly

do

does

doesn't

doing

done

don't

down

downwards

during

each

edu

eg

eight

eighty

either

else

elsewhere

end

ending

enough

entirely

especially

et

etc

even

ever

evermore

every

everybody

everyone

everything

everywhere

ex

exactly

example

except

fairly

far

farther

few

fewer

fifth

first

five

followed

following

follows

for

forever

former

formerly

forth

forward

found

four

from

further

furthermore

get

gets

getting

given

gives

go

goes

going

gone

got

gotten

greetings

had

hadn't

half

happens

hardly

has

hasn't

have

haven't

having

he

he'd

he'll

hello

help

hence

her

here

hereafter

hereby

herein

here's

hereupon

hers

herself

he's

hi

him

himself

his

hither

hopefully

how

howbeit

however

hundred

i'd

ie

if

ignored

i'll

i'm

immediate

in

inasmuch

inc

inc.

indeed

indicate

indicated

indicates

inner

inside

insofar

instead

into

inward

is

isn't

it

it'd

it'll

its

it's

itself

i've

just

k

keep

keeps

kept

know

known

knows

last

lately

later

latter

latterly

least

less

lest

let

let's

like

liked

likely

likewise

little

look

looking

looks

low

lower

ltd

made

mainly

make

makes

many

may

maybe

mayn't

me

mean

meantime

meanwhile

merely

might

mightn't

mine

minus

miss

more

moreover

most

mostly

mr

mrs

much

must

mustn't

my

myself

name

namely

nd

near

nearly

necessary

need

needn't

needs

neither

never

neverf

neverless

nevertheless

new

next

nine

ninety

no

nobody

non

none

nonetheless

noone

no-one

nor

normally

not

nothing

notwithstanding

novel

now

nowhere

obviously

of

off

often

oh

ok

okay

old

on

once

one

ones

one's

only

onto

opposite

or

other

others

otherwise

ought

oughtn't

our

ours

ourselves

out

outside

over

overall

own

particular

particularly

past

per

perhaps

placed

please

plus

possible

presumably

probably

provided

provides

que

quite

qv

rather

rd

re

really

reasonably

recent

recently

regarding

regardless

regards

relatively

respectively

right

round

said

same

saw

say

saying

says

second

secondly

see

seeing

seem

seemed

seeming

seems

seen

self

selves

sensible

sent

serious

seriously

seven

several

shall

shan't

she

she'd

she'll

she's

should

shouldn't

since

six

so

some

somebody

someday

somehow

someone

something

sometime

sometimes

somewhat

somewhere

soon

sorry

specified

specify

specifying

still

sub

such

sup

sure

take

taken

taking

tell

tends

th

than

thank

thanks

thanx

that

that'll

thats

that's

that've

the

their

theirs

them

themselves

then

thence

there

thereafter

thereby

there'd

therefore

therein

there'll

there're

theres

there's

thereupon

there've

these

they

they'd

they'll

they're

they've

thing

things

think

third

thirty

this

thorough

thoroughly

those

though

three

through

throughout

thru

thus

till

to

together

too

took

toward

towards

tried

tries

truly

try

trying

t's

twice

two

un

under

underneath

undoing

unfortunately

unless

unlike

unlikely

until

unto

up

upon

upwards

us

use

used

useful

uses

using

usually

v

value

various

versus

very

via

viz

vs

want

wants

was

wasn't

way

we

we'd

welcome

well

we'll

went

were

we're

weren't

we've

what

whatever

what'll

what's

what've

when

whence

whenever

where

whereafter

whereas

whereby

wherein

where's

whereupon

wherever

whether

which

whichever

while

whilst

whither

who

who'd

whoever

whole

who'll

whom

whomever

who's

whose

why

will

willing

wish

with

within

without

wonder

won't

would

wouldn't

yes

yet

you

you'd

you'll

your

you're

yours

yourself

yourselves

you've

zero

a

how's

i

when's

why's

b

c

d

e

f

g

h

j

l

m

n

o

p

q

r

s

t

u

uucp

w

x

y

z

I

www

amount

bill

bottom

call

computer

con

couldnt

cry

de

describe

detail

due

eleven

empty

fifteen

fifty

fill

find

fire

forty

front

full

give

hasnt

herse

himse

interest

itse”

mill

move

myse”

part

put

show

side

sincere

sixty

system

ten

thick

thin

top

twelve

twenty

abst

accordance

act

added

adopted

affected

affecting

affects

ah

announce

anymore

apparently

approximately

aren

arent

arise

auth

beginning

beginnings

begins

biol

briefly

ca

date

ed

effect

et-al

ff

fix

gave

giving

heres

hes

hid

home

id

im

immediately

importance

important

index

information

invention

itd

keys

kg

km

largely

lets

line

'll

means

mg

million

ml

mug

na

nay

necessarily

nos

noted

obtain

obtained

omitted

ord

owing

page

pages

poorly

possibly

potentially

pp

predominantly

present

previously

primarily

promptly

proud

quickly

ran

readily

ref

refs

related

research

resulted

resulting

results

run

sec

section

shed

shes

showed

shown

showns

shows

significant

significantly

similar

similarly

slightly

somethan

specifically

state

states

stop

strongly

substantially

successfully

sufficiently

suggest

thered

thereof

therere

thereto

theyd

theyre

thou

thoughh

thousand

throug

til

tip

ts

ups

usefully

usefulness

've

vol

vols

wed

whats

wheres

whim

whod

whos

widely

words

world

youd

youre

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article