Skip to content

Can not training with another language #86

Open
@phamkhactu

Description

@phamkhactu

Thanks for your excellent working!

I want to training my g2p with other language, in my case is vietnamese

phạc ph a_T5 c2
num n u_T0 m2
rim r i_T0 m2
giẫn gi a3_T4 n2
toăm t oa2_T0 m2
lịu l iu_T5
cựi c u2i_T5
õng o_T4 ng2

I get error:

INFO:phonetisaurus-train:2023-09-06 17:51:21:  Checking command configuration...
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  Directory does not exist.  Trying to create.
INFO:phonetisaurus-train:2023-09-06 17:51:21:  Checking lexicon for reserved characters: '}', '|', '_'...
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  arpa_path:  train/model.o8.arpa
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  corpus_path:  train/model.corpus
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  dir_prefix:  train
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  grow:  False
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  lexicon_file:  /tmp/tmp53qaxdn7.txt
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  logger:  <Logger phonetisaurus-train (DEBUG)>
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  makeJointNgramCommand:  <bound method G2PModelTrainer._mitlm of <__main__.G2PModelTrainer object at 0x7fd3aae0ec40>>
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  model_path:  train/model.fst
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  model_prefix:  model
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  ngram_order:  8
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  seq1_del:  False
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  seq1_max:  2
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  seq2_del:  True
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  seq2_max:  2
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  verbose:  True
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  phonetisaurus-align --input=/tmp/tmp53qaxdn7.txt --ofile=train/model.corpus --seq1_del=false --seq2_del=true --seq1_max=2 --seq2_max=2 --grow=false
INFO:phonetisaurus-train:2023-09-06 17:51:21:  Aligning lexicon...
GitRevision: package
Loading input file: /tmp/tmp53qaxdn7.txt
Please provide a valid input file.
ERROR:phonetisaurus-train:2023-09-06 17:51:21:  Alignment failed.  Exiting.
Traceback (most recent call last):
  File "/home/tupk/anaconda3/envs/nlp/bin/phonetisaurus", line 8, in <module>
    sys.exit(main())
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/phonetisaurus/__main__.py", line 74, in main
    do_train(args, casing, env)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/phonetisaurus/__main__.py", line 209, in do_train
    train(lexicon=lexicon, model_path=args.model, corpus_path=args.corpus, env=env)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/phonetisaurus/__init__.py", line 121, in train
    subprocess.check_call(train_cmd, cwd=temp_dir_str, env=env)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['phonetisaurus-train', '--lexicon', '/tmp/tmp53qaxdn7.txt', '--seq2_del', '--verbose']' returned non-zero exit status 1.

But if I train with English lexicon, no problem

How can I fix it?
Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions