I read the question in the link above (you may have to click on the title) a couple of months ago, and have been thinking about it on and off for the last little bit. here are my ideas on creating a basis for the english language using a computer: Please note this is just in note form. When it piques my interest again I’ll come back and rewrite this 🙂
Finding the basis of the english language.
2 matrices, 1 with the word and their classification (word in column 1, classification in column 2.) n rows for n words.
Different versions of same word (ie, the animal bear, and can’t bear to see you go)are different words.
Matrix 2 would be n rows and m columns where m is the maximum length of the definition of the word. Each row of matrix 2 (the definition) would be on the same row as matrix 1 (the word).
Create a third matrix, undefined size, called basis.
Run through matrix 2, inserting each word into basis after checking that the word does not already exist in basis. At the end of this loop, basis will be a large matrix of independent terms.
Create another matrix, smaller than basis, called basis2. Run through all the elements of basis, converting the words to a number such that the numbers refer to the row that the word appears in matrix 1. May have some issues here with words with multiple meanings.
If the word does not have a definition, keep the word as it is. Store all these new numbers in basis2, removing them from basis. After this, you should have 2 matrices, one with only words, and the other with only numbers, that together refer to the original basis matrix.
Now use the numbers in basis2. Go to the equivalent row of matrix 2. Run the search again, where it runs through the definition and searches through basis to see if there are any words in the definition that are not in basis, but store these words in a new matrix called secondary.
At this point, you should have a core matrix of words in basis, and some secondary words in deleted. If basis is empty, then the english language is circular. If there is at least 1 word in basis, then there is a basis of undefinable words that make up the language. The words in secondary make up a list of words that refer to each other but that can be used in conjunction with each other to make all the words of the english language.
A few potential problems:
getting around multiple definition words. Perhaps using a number system, where a number is stored after a word (so bear becomes bear1 and bear2) could be arranged. Would be best to apply this to a smaller group of definitions so that the potential problems could be spotted and destroyed, and so that the method could be refined.
There is also a chance that, if the english language is circular, there will be a potential for infinite loops to appear. This is just something that will have to be considered in the programming, and avoided as best as possible by using escape routes.
A bigger problem is getting a text based dictionary whose format is easy to convert for a program like matlab (which is all I have) to be able to convert it to matrices. Once that step is overcome, the rest is fairly easy. If anyone does have any ideas on this last bit/ a plaintext dictionary that they think will be suitable, please let me know, as I wouldn’t mind having a crack at writing this program.
If anyone else thinks they would like to use my idea, they may, I just ask that they let me know before they start, and make sure they mention me in the program.
I don’t know if there is any potential for a program like this to be useful. perhaps a canonised set of words that form a basis of english can be compared to a similar set of words in another language and hence aid translation. But I don’t have that knowledge, nor do I know if this has been done before, so I leave it open to whoever wants to use it.