Skip to content

Verified error fixes and readability improvement #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ahwang16
Copy link

This version is verified to work on Google Colab using a GPU and Python3. This fixes typos in the original version that prevented the program from running, eliminates the dependency on contrib, and improves readability by changing a couple variable names and spacing.

ahwang16 added 6 commits June 26, 2020 21:51
Fix typos that made the code fail to run. Eliminate dependency on contrib (because it is not actively maintained) by replacing methods with equivalent core or addon methods (in one case, the logic is the same as the contrib source code).
Fix critical errors, eliminate dependency on contrib
Reformat spacing (including indentation and spaces between operators) and adjust variable names to be more compliant with PEP8 guidelines and improve readability.
This commit fixes critical errors that prevented the program from running, eliminates the dependency on contrib (since contrib is not well maintained), and improves readability by renaming some variables and being more compliant with PEP8 guidelines.
@glicerico
Copy link

Thanks for this PR @ahwang16 .
Did you train on switchboard? Can you please share what accuracy did your trained model achieve?

@ahwang16
Copy link
Author

I did train on switchboard, but I don't have much information because my summer internship ended in August and I have not returned to Google yet. Without any sort of parameter search/optimization, my trained model achieved a test accuracy of 0.743 (using 300-dimensional GloVe embeddings). I later used BERT embeddings, which pushed the accuracy up to 0.809.

@glicerico
Copy link

Thanks, sounds quite promising, specially using BERT embeddings.
Can you offer some insight into how to use real data?
Currently I see the test data declared as:

data = [[[1,2,3,4],[1,2,3],[2,3,5]],[[1,0], [4]],[[1,2,8,4],[1,1,3],[2,3,9,1,3,1,9]], [[1,2,3,4,5,7,8,9],[9,1,2,4],[8,9,0,1,2]],[[1,2,4,3,2,3],[9,8,7,5,5,5,5,5,5,5,5]],[[1,2,3,4,5,6,9],[9,1,0,0,2,4,6,5,4]],[[1,2,3,4,5,6,7,8,9],[9,1,2,4],[8,9,0,1,2]],[[1]] , [[1,2,11,2,3,2,1,1,3,4,4], [6,5,3,2,1,1,4,5,6,7], [9,8,1], [1,6,4,3,5,7,8], [0,9,2,4,6,2,4,6], [5,2,2,5,6,7,3,7,2,2,1], [0,0,0,1,2,7,5,3,7,5,3,6], [1,3,6,6,3,3,3,5,6,7,2,4,2,1], [1,2,4,5,2,3,1,5,1,1,2], [9,0,1,0,0,1,3,3,5,3,2], [0,9,2,3,0,2,1,5,5,6], [9,0,0,1,4,2,4,10,13,11,12], [0,0,1,2,3,0,1,1,0,1,2], [0,0,1,3,1,12,13,3,12,3], [0,9,1,2,3,4,1,3,2]]]
labels = [[1,2,1],[0, 3],[1,2,1],[1,0,2], [2,1], [1,1], [2,1,2], [4], [0,1,2,0,2,4,2,1,0,1,0,2,1,2,0]]

I guess the labels correspond to speech acts. What about the arrays of data? Are they word ids for each utterance?

@ahwang16
Copy link
Author

ahwang16 commented Jan 31, 2021

For real data, I used the Switchboard Dialogue Act Corpus (SWDA) and an internal dataset (not sure if it is available to the public).

The multilayered structure of data can be pretty confusing. In the sample test data, each integer is a token, a list of tokens is an utterance, and a list of utterances is a dialogue. The labels correspond to dialogue acts at the utterance level.

So you can understand the data format like:

data =
[
    [
         [word1, word2, ...], # utt 1 of dialogue 1
         [word1, ... ], # utt 2 of dialogue 1
         ...
    ] # dialogue 1
    ...
] # entire dataset

labels = 
[
    [DA of utt 1, DA of utt 2, ...], # dialogue 1
    ...
]

For the two-tier architecture in this code, the data should be parsed at the utterance level for the first layer and the dialogue level for the second layer.

@glicerico
Copy link

Thanks for the description!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants