Rankings, definitions, pronunciations and additional data for NGSL, NAWL, TSL, and BSL

April 4, 2017January 7, 2023 Paul Raine TEFL

I have generated supplementary data for four word lists (NGSL, NAWL, TSL, and BSL) originally created by Dr. Charles Browne et al. The supplementary data includes:

Word: the word (lemma) as it appears on the original list
POS: the most common part-of-speech for the word according to the Moby Part-of-Speech database
BNC Rank: the frequency ranking of the word according to the British National Corpus (lower number equals higher frequency)
Google Rank: the frequency ranking of the word according to the Google Corpus (lower number equals higher frequency)
IPA: the International Phonetic Alphabet transcription of the word, using data derived from the CMU Pronuncing Dictionary
Conjugations: variations of the form of the word according to tense, person, etc*
Synonyms: a list of words with similar or related meanings*
– 23. Multilingual definitions: Arabic, Chinese, German, Greek, English, French, Italian, Japanese, Korean, Dutch, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish*

*Data provided by public domain dictionary/thesaurus sources, where available.

Download the data:

This supplementary data is available under the same license as the original lists: Creative Commons Attribution-ShareAlike 4.0 International License.

7 thoughts on “Rankings, definitions, pronunciations and additional data for NGSL, NAWL, TSL, and BSL”

Zemi 2-03 (10/03) – うらの研究室＠北海学園大学 says:

October 3, 2017 at 12:13 pm

[…] NGSL with definitions […]

LikeLike

Reply
Zemi 2-04 (10/10) – うらの研究室＠北海学園大学 says:

October 9, 2017 at 4:39 pm

[…] NGSL with definitions […]

LikeLike

Reply
NGSL Redirect | paulsensei.com says:

August 9, 2018 at 8:34 pm

[…] Please see the new blog post. […]

LikeLike

Reply
pglove says:

September 19, 2020 at 6:52 pm

Hi, Paul

I want to create a set of study cards based on the NGSL with POS, BNC Rank, Google Rank, IPA, Conjugations, Synonyms, and Definitions in English and the learner’s language.

Do you know if the information you have provided can be reproduced on a commercial basis?

i.e. are there restrictions on use of the information beyond CC 4.0?

I intend to use them in the school I work at, but it’s a lot of work to keep to myself, so monetising seems sensible if possible. If there are restrictions, I could remove those elements in a commercial design.

I know you are not the rights holder of all of these data sets, but perhaps you have some insight you could share.

I’d be very happy to collaborate on such a project, if you are interested.

I have been looking at other corpus data, especially COCA, but they are quite restrictive regarding publication of specific rank and frequency information. They require banding of frequency into 20 bands or less.

Hope to hear back.

Thanks,

Miguel

LikeLike

Reply
Paul Raine says:

September 22, 2020 at 12:16 am

Hi Miguel,

Thanks for getting in touch.
As far as I’m aware, this data cannot be used for commercial purposes, only for non-commercial educational purposes.
Sorry about that.

Best,

Paul

LikeLike

Reply
John says:

February 24, 2021 at 6:16 am

Thanks you so much

LikeLike

Reply
Comment hacker le TOEIC ? (et tout autre test de langue) – Martin`s Site says:

March 12, 2024 at 12:01 am

[…] Les autres listes sont disponibles au lien suivant. […]

LikeLike

Reply