From 5ad69bf8e1d1d5a359296613c8969a81ad743b7d Mon Sep 17 00:00:00 2001
From: Jonne Saleva <jonne@jonnesaleva.com>
Date: Wed, 26 Feb 2020 18:36:44 -0500
Subject: .

---
 README.md | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
 create mode 100644 README.md

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..3f309e3
--- /dev/null
+++ b/README.md
@@ -0,0 +1,14 @@
+# Etymological clustering of Yiddish words
+
+Yiddish is an interesting language not only because it is low-resource, but also because it has an etymologically diverse lexicon, with words coming from Germanic, Slavic, and Semitic backgrounds. 
+
+While it is possible for a speaker to recognize the etymological origin of a word by simply looking at it, it would be interesting to do this automatically.
+
+This motivates several interesting research questions:
+
+1. Is it possible to classify Yiddish words accurately and generalizably, given some labeled training data?
+2. What sort of feature function should we use? Is it possible to *learn* such a training data using, say, a neural network?
+    a. Do we seed the feature function with character counts, tf-idf weights, or just use random projections?
+3. How well can we do with less training data? Is it possible to do this in an unsupervised way?
+4. Is it possible to *jointly* learn the clustering and the feature embedding?
+5. How meaningful are the discovered clusters?
-- 
cgit 1.4.1-2-gfad0