Verified error fixes and readability improvement #8

ahwang16 · 2020-06-29T14:26:01Z

This version is verified to work on Google Colab using a GPU and Python3. This fixes typos in the original version that prevented the program from running, eliminates the dependency on contrib, and improves readability by changing a couple variable names and spacing.

Fix typos that made the code fail to run. Eliminate dependency on contrib (because it is not actively maintained) by replacing methods with equivalent core or addon methods (in one case, the logic is the same as the contrib source code).

Fix critical errors, eliminate dependency on contrib

Reformat spacing (including indentation and spaces between operators) and adjust variable names to be more compliant with PEP8 guidelines and improve readability.

This commit fixes critical errors that prevented the program from running, eliminates the dependency on contrib (since contrib is not well maintained), and improves readability by renaming some variables and being more compliant with PEP8 guidelines.

glicerico · 2021-01-29T23:55:33Z

Thanks for this PR @ahwang16 .
Did you train on switchboard? Can you please share what accuracy did your trained model achieve?

ahwang16 · 2021-01-30T01:18:47Z

I did train on switchboard, but I don't have much information because my summer internship ended in August and I have not returned to Google yet. Without any sort of parameter search/optimization, my trained model achieved a test accuracy of 0.743 (using 300-dimensional GloVe embeddings). I later used BERT embeddings, which pushed the accuracy up to 0.809.

glicerico · 2021-01-30T23:13:03Z

Thanks, sounds quite promising, specially using BERT embeddings.
Can you offer some insight into how to use real data?
Currently I see the test data declared as:

data = [[[1,2,3,4],[1,2,3],[2,3,5]],[[1,0], [4]],[[1,2,8,4],[1,1,3],[2,3,9,1,3,1,9]], [[1,2,3,4,5,7,8,9],[9,1,2,4],[8,9,0,1,2]],[[1,2,4,3,2,3],[9,8,7,5,5,5,5,5,5,5,5]],[[1,2,3,4,5,6,9],[9,1,0,0,2,4,6,5,4]],[[1,2,3,4,5,6,7,8,9],[9,1,2,4],[8,9,0,1,2]],[[1]] , [[1,2,11,2,3,2,1,1,3,4,4], [6,5,3,2,1,1,4,5,6,7], [9,8,1], [1,6,4,3,5,7,8], [0,9,2,4,6,2,4,6], [5,2,2,5,6,7,3,7,2,2,1], [0,0,0,1,2,7,5,3,7,5,3,6], [1,3,6,6,3,3,3,5,6,7,2,4,2,1], [1,2,4,5,2,3,1,5,1,1,2], [9,0,1,0,0,1,3,3,5,3,2], [0,9,2,3,0,2,1,5,5,6], [9,0,0,1,4,2,4,10,13,11,12], [0,0,1,2,3,0,1,1,0,1,2], [0,0,1,3,1,12,13,3,12,3], [0,9,1,2,3,4,1,3,2]]]
labels = [[1,2,1],[0, 3],[1,2,1],[1,0,2], [2,1], [1,1], [2,1,2], [4], [0,1,2,0,2,4,2,1,0,1,0,2,1,2,0]]

I guess the labels correspond to speech acts. What about the arrays of data? Are they word ids for each utterance?

ahwang16 · 2021-01-31T00:30:47Z

For real data, I used the Switchboard Dialogue Act Corpus (SWDA) and an internal dataset (not sure if it is available to the public).

The multilayered structure of data can be pretty confusing. In the sample test data, each integer is a token, a list of tokens is an utterance, and a list of utterances is a dialogue. The labels correspond to dialogue acts at the utterance level.

So you can understand the data format like:

data =
[
    [
         [word1, word2, ...], # utt 1 of dialogue 1
         [word1, ... ], # utt 2 of dialogue 1
         ...
    ] # dialogue 1
    ...
] # entire dataset

labels = 
[
    [DA of utt 1, DA of utt 2, ...], # dialogue 1
    ...
]

For the two-tier architecture in this code, the data should be parsed at the utterance level for the first layer and the dialogue level for the second layer.

glicerico · 2021-02-01T05:56:32Z

Thanks for the description!

ahwang16 added 6 commits June 26, 2020 21:51

Merge pull request #1 from ahwang16/ahwang16-patch-1

eb1a886

Fix critical errors, eliminate dependency on contrib

Improve readability

cde0789

Reformat spacing (including indentation and spaces between operators) and adjust variable names to be more compliant with PEP8 guidelines and improve readability.

Update README.md

ea8f648

Import tf v1 and replace toy dataset

1b2f8d8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verified error fixes and readability improvement #8

Verified error fixes and readability improvement #8

ahwang16 commented Jun 29, 2020

glicerico commented Jan 29, 2021

ahwang16 commented Jan 30, 2021

glicerico commented Jan 30, 2021

ahwang16 commented Jan 31, 2021 •

edited

Loading

glicerico commented Feb 1, 2021

Verified error fixes and readability improvement #8

Are you sure you want to change the base?

Verified error fixes and readability improvement #8

Conversation

ahwang16 commented Jun 29, 2020

glicerico commented Jan 29, 2021

ahwang16 commented Jan 30, 2021

glicerico commented Jan 30, 2021

ahwang16 commented Jan 31, 2021 • edited Loading

glicerico commented Feb 1, 2021

ahwang16 commented Jan 31, 2021 •

edited

Loading