git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Handle foreign character web input


On 6/28/19 1:33 PM, Chris Angelico wrote:> On Sat, Jun 29, 2019 at 6:31 AM Tobiah <toby at tobiah.org> wrote:
>>
>> A guy comes in and enters his last name as R?nngren.
>>
>> So what did the browser really give me; is it encoded
>> in some way, like latin-1?  Does it depend on whether
>> the name was cut and pasted from a Word doc. etc?
>> Should I handle these internally as unicode?  Right
>> now my database tables are latin-1 and things seem
>> to usually work, but not always.
> 
> Definitely handle them as Unicode. You'll receive them in some
> encoding, probably UTF-8, and it depends on the browser. Ideally, your
> back-end library (eg Flask) will deal with that for you.
It varies by browser?
So these records are coming in from all over the world.  How
do people handle possibly assorted encodings that may come in?

I'm using Web2py.  Does the request come in with an encoding
built in?  Is that how people get the proper unicode object?

>> Also, what do people do when searching for a record.
>> Is there some way to get 'Ronngren' to match the other
>> possible foreign spellings?
> 
> Ehh....... probably not. That's a human problem, not a programming
> one. Best of luck.

Well so I'm at an event.  A guy comes up to me at the kiosk
and say his name is R?nngren.  I can't find him, typing in "ron"
so I ask him how to spell his last name.  What does he say, and
what do I type?